Random forests are among the most popular machine learning methods thanks to their relatively good accuracy, robustness and ease of use. They also provide two straightforward methods for feature selection:
- Gini Importance or Mean Decrease in Impurity (MDI)
- Permutation Importance or Mean Decrease in Accuracy (MDA)
A novel all-relevant feature selection method is:
- Boruta, conceived by Witold R. Rudnicki and developed by Miron B. Kursa at the Interdisciplinary Centre for Mathematical and Computational Modelling at the University of Warsaw (ICM UW)
The Jupyter Notebook in this repository is organized as follows:
- Describe in details the 3 Feature Importance methods in Random Forests (Algortithm, Remarks and Implementation).
- Apply the 3 methods on two datasets (Wine Dataset and House Sales Dataset).