Code Monkey home page Code Monkey logo

erdogant / hgboost Goto Github PK

View Code? Open in Web Editor NEW
55.0 5.0 17.0 24.79 MB

hgboost is a python package for hyper-parameter optimization for xgboost, catboost or lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.

Home Page: http://erdogant.github.io/hgboost

License: Other

Python 82.71% Shell 0.33% Jupyter Notebook 16.96%
gridsearch hyperoptimization xgboost catboost lightboost crossvalidation machine-learning python

hgboost's Introduction

hgboost - Hyperoptimized Gradient Boosting

Python PyPI Version License Github Forks GitHub Open Issues Project Status Downloads Downloads DOI Sphinx Open In Colab Medium


hgboost is short for Hyperoptimized Gradient Boosting and is a python package for hyperparameter optimization for xgboost, catboost and lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.

hgboost is fun because:

* 1. Hyperoptimization of the Parameter-space using bayesian approach.
* 2. Determines the best scoring model(s) using k-fold cross validation.
* 3. Evaluates best model on independent evaluation set.
* 4. Fit model on entire input-data using the best model.
* 5. Works for classification and regression
* 6. Creating a super-hyperoptimized model by an ensemble of all individual optimized models.
* 7. Return model, space and test/evaluation results.
* 8. Makes insightful plots.

⭐️ Star this repo if you like it ⭐️


Blogs

Medium Blog 1: The Best Boosting Model using Bayesian Hyperparameter Tuning but without Overfitting.

Medium Blog 2: Create Explainable Gradient Boosting Classification models using Bayesian Hyperparameter Optimization.


On the documentation pages you can find detailed information about the working of the hgboost with many examples.


Colab Notebooks

  • Open regression example In Colab Regression example

  • Open classification example In Colab Classification example


Schematic overview of hgboost

Installation Environment

conda create -n env_hgboost python=3.8
conda activate env_hgboost

Install from pypi

pip install hgboost
pip install -U hgboost # Force update

Import hgboost package

import hgboost as hgboost

Examples

Classification example for xgboost, catboost and lightboost:

# Load library
from hgboost import hgboost

# Initialization
hgb = hgboost(max_eval=10, threshold=0.5, cv=5, test_size=0.2, val_size=0.2, top_cv_evals=10, random_state=42)

# Fit xgboost by hyperoptimization and cross-validation
results = hgb.xgboost(X, y, pos_label='survived')

# [hgboost] >Start hgboost classification..
# [hgboost] >Collecting xgb_clf parameters.
# [hgboost] >Number of variables in search space is [11], loss function: [auc].
# [hgboost] >method: xgb_clf
# [hgboost] >eval_metric: auc
# [hgboost] >greater_is_better: True
# [hgboost] >pos_label: True
# [hgboost] >Total dataset: (891, 204) 
# [hgboost] >Hyperparameter optimization..
#  100% |----| 500/500 [04:39<05:21,  1.33s/trial, best loss: -0.8800619834710744]
# [hgboost] >Best performing [xgb_clf] model: auc=0.881198
# [hgboost] >5-fold cross validation for the top 10 scoring models, Total nr. tests: 50
# 100%|██████████| 10/10 [00:42<00:00,  4.27s/it]
# [hgboost] >Evalute best [xgb_clf] model on independent validation dataset (179 samples, 20.00%).
# [hgboost] >[auc] on independent validation dataset: -0.832
# [hgboost] >Retrain [xgb_clf] on the entire dataset with the optimal parameters settings.
# Plot the ensemble classification validation results
hgb.plot_validation()


References

* http://hyperopt.github.io/hyperopt/
* https://github.com/dmlc/xgboost
* https://github.com/microsoft/LightGBM
* https://github.com/catboost/catboost

Maintainers

Contribute

  • Contributions are welcome.

Licence See LICENSE for details.

Coffee

  • If you wish to buy me a Coffee for this work, it is very appreciated :)

hgboost's People

Contributors

bneijt avatar erdogant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hgboost's Issues

Error in rmse calculaiton

if self.eval_metric=='rmse':
                loss = mean_squared_error(y_test, y_pred)

mean_squared_error in sklearn gives mse, use mean_squared_error(y_true, y_pred, squared=False) for rmse

Specific data results problem

Your package is very convenient and effective, but if I want to redraw the plot, what can I do to get specific data results?
thanks for helping.

import error during import hgboost

When I finished installation of hgboost and try to import hgboost,there is something wrong,could you please help me out? Details are as follows:

ImportError Traceback (most recent call last)
in
----> 1 from hgboost import hgboost

C:\ProgramData\Anaconda3\lib\site-packages\hgboost_init_.py in
----> 1 from hgboost.hgboost import hgboost
2
3 from hgboost.hgboost import (
4 import_example,
5 )

C:\ProgramData\Anaconda3\lib\site-packages\hgboost\hgboost.py in
9 import classeval as cle
10 from df2onehot import df2onehot
---> 11 import treeplot as tree
12 import colourmap
13

C:\ProgramData\Anaconda3\lib\site-packages\treeplot_init_.py in
----> 1 from treeplot.treeplot import (
2 plot,
3 randomforest,
4 xgboost,
5 lgbm,

C:\ProgramData\Anaconda3\lib\site-packages\treeplot\treeplot.py in
14 import numpy as np
15 from sklearn.tree import export_graphviz
---> 16 from sklearn.tree.export import export_text
17 from subprocess import call
18 import matplotlib.image as mpimg

ImportError: cannot import name 'export_text' from 'sklearn.tree.export'

thanks a lot!

Test:Validation:Train split

Shouldn't be the new test-train split be test_size=self.test_size/(1-self.val_size) in def _HPOpt(self):. We updated the shape of X in _set_validation_set(self, X, y)

I'm assuming that the test, train, and validation set ratios are defined on the original data.

Xgboost parameter

After using the code hgb.plot_params(), the parameter of learning rate is 796. I don't think it's reasonable.
Can I see the model parameters optimized by using HyperOptimized parameters?

QQ截图20210705184733

hgboost installation error

When installing in colab...this error is coming
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Could you pl help

HP Tuning: best_model uses different parameters from those that were reported as best ones

I used hgboost for optimizing the hyper-parameters of my XGBoost model as described in the API References with the following parameters:

hgb = hgboost()
results = hgb.xgboost(X_train, y_train, pos_label=1, method='xgb_clf', eval_metric='logloss')

As noted in the documentation, results is a dictionary that, among other things, returns the best performing parameters (best_params) and the best performing model (model). However, the parameters that the best performing model uses are different from what the function returns as best_params:

best_params

'params': {'colsample_bytree': 0.47000000000000003,
  'gamma': 1,
  'learning_rate': 534,
  'max_depth': 49,
  'min_child_weight': 3.0,
  'n_estimators': 36,
  'subsample': 0.96}

model

'model': XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
               colsample_bynode=1, colsample_bytree=0.47000000000000003,
               enable_categorical=False, gamma=1, gpu_id=-1,
               importance_type=None, interaction_constraints='',
               learning_rate=0.058619090164329916, max_delta_step=0,
               max_depth=54, min_child_weight=3.0, missing=nan,
               monotone_constraints='()', n_estimators=200, n_jobs=-1,
               num_parallel_tree=1, predictor='auto', random_state=0,
               reg_alpha=0, reg_lambda=1, scale_pos_weight=0.5769800646551724,
               subsample=0.96, tree_method='exact', validate_parameters=1,
               verbosity=0),

As you can see, for example, max_depth=49 in the best_params, but the model uses max_depth=54 etc.

Is this a bug or the intended behavior? In case of the latter, I'd really appreciate an explanation!

My setup:

  • OS: WSL (Ubuntu)
  • Python: 3.9.7
  • hgboost: 1.0.0

Treeplot failure - missing graphviz dependency

I'm running through the example classification notebook now, and the treeplot fails to render, with the following warning:

Screen Shot 2022-10-04 at 14 30 21

It seems that graphviz being a compiled c library is not bundled in pip (it is included in conda install treeplot/graphviz though).

Since we have no recourse to add this to pip requirements, maybe a sentence in the Instalation instructions warning that graphviz must already be available and/or installed separately.

(note the suggested apt command for linux is not entirely necessary, because pydot does get installed with treeplot via pip)

Hgboost on GPU

Does this package has the capability to operate on GPU? if yes, can you specify where to specify the parameter?

TypeError: LGBMRegressor.fit() got an unexpected keyword argument 'early_stopping_rounds'

First of all, thank you for the development of this Sw that allows in an easy way to find the optimal parameters for gradient descent algorithms with boosting.

It seems to me that they have updated the LGBM version and now 4.0.x is available. I am using Google Colab, and installing the package, both by pypi and directly from Github shows the error in the title.

I hope it can be easily fixed. Best regards.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.