Code Monkey home page Code Monkey logo

hptuning's Introduction

HpTuning

'HPTuning' is a program that performs hyperparameter tuning of Machine Learning (ML) classification algorithms using different optimization techniques. The project uses the 'mlr' [01] package as structure, and works as a wrapper to the techniques provided by literature.

Technical Requirements

Setup

You can install the current project, please use the following command inside your R session:

devtools::install_github("rgmantovani/HpTuning")

How it works?

Basically, given a 4-tuple of execution parameters <datafile, algo, tuning, epoch>, it will tune the <algo> on <datafile> using the <tuning> technique. The <epoch> parameter specifies the seed of the repetition being executed. Since most of the tuning techniques covered here are stochastic, when comparing them they need to run several times with different seeds.

Each execution (or single job) will be saved in a different folder and organized by its input parameters. The program will store at the disk:

  • the final performance values reached by the tuned models;
  • the predictions reached by the tuned models;
  • the hyper-parameters returned by the optimization process; and
  • the optimization path with all the candidate settings evaluated during search.

Available Options

There is no restriction regarding the datafile option: the code will run with the datasets provided by you and located at the data sub-folder. Obs: On every datafile, the target attribute must be the last one and labeled as Class.

The available options (in this current version) for the other runtime parameters are:

  • algo - ML algorithm to be tuned:

    • "classif.J48": J48 Decision Tree algorithm implemented by the RWeka package;
    • "classif.rpart": CART trees, implemented by the rpart package;
    • "classif.svm": Support Vector Machines, implemented by the e1071 package;
    • "classif.randomForest": Random Forest, implemented by the randomForest package;
    • "classif.ctree": Conditional Inference Trees, implemented by the party package;
    • "classif.xgboost": eXtreme Gradiante Boosting, implemented by the xgboost package;
    • "classif.C50": C5.0 Decision Trees, implemented by the C50 package;
    • "classif.glment": Generalized Linear Models, implemented by the glmnet package;
    • "classif.kknn": Weighted k-Nearest Neighbors, implemented by the kknn package;
    • "classif.naiveBayes": Naive Bayes, implemented by the e1071 package.
  • tuning - hyperparameter tuning technique:

    • "defaults" - Default hyperparameter values (from respecitive R implementations);
    • "random" - Random Search (RS) [02], the mlr implementation;
    • "mbo" - Sequential Model-Based Optimization (SMBO) [03], implemented by the mlrMBO package;
    • "irace" - Iterative Racing Algorithm [04], implemented by the irace package;
    • "pso" - Particle Swarm Optimization [05], implemented by the pso package;
    • "ga" - Genetic Algorithm [06], implemented by the GA package;
    • "eda" - Estimation of Distribution Algorithms (EDA) [07], implemented by the copulaedas package.
  • epoch - id of the repetition being executed. It controls the seed for reproducibility. We restrict the range between 1 and 30.

Running the code

To run the project, please, call it by the bash file executing it by the command:

R CMD BATCH --no-save --no-restore '--args' --datafile=<datafile> --algo=<algo> --tuning=<tuning> \
  --epoch=<epoch> mainHP.R out_job.log &  

It will start the script saving the status in an output log file. You can follow the execution and errors checking directly this file, and also change the name of this log file as you wish.

Contact

Rafael Gomes Mantovani ([email protected] / [email protected]) Universidade Tecnológica Federal do Paraná (UTFPR) - Apucarana, Brazil.

References

[01] B. Bischl, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Zachary Jones, Giuseppe Casalicchio. mlr: Machine Learning in R, R package version 2.10. URL https://github.com/mlr-org/mlr.

[02] J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res. 13 (2012) 281–305.

[03] J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimization ofmachine learning algorithms, in: F. Pereira, C. Burges, L. Bottou, K. Weinberger (Eds.), Advances in Neural Information Processing Systems 25, Cur-ran Associates, Inc., 2012, pp. 2951–2959.

[04] M. López-Ibáñez, J. Dubois-Lacoste, L. Pérez Cáceres, T. Stützle, and M.Birattari. The irace package: Iterated Racing for Automatic Algorithm Configuration. Operations Research Perspectives, 2016.

[05] J. Kennedy, R. Eberhart, Particle swarm optimization. In: Proceedingsof the IEEE International Conference on Neural Networks, Vol. 4, Perth,Australia, 1995, pp. 1942 – 1948.

[06] D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison Wesley, 1989.

[07] M. Hauschild, M. Pelikan. An introduction and survey of estimation of distribution algorithms, Swarm and Evolutionary Computation (3) (2011)111 – 128.

hptuning's People

Contributors

rgmantovani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

hptuning's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.