ProtocoleDataScience
package in Python 3.6 emcompasses FeatureEngineering, FeatureImportance, MachineLearning for classification, Hyperparameters optimization, model performance metrics Currently solely based on ATP.csv kaggle dataset
Repository contains:
- pre_proc_lib.py
- ML_lib_hyperparams_optim.ipynb (Jupyter Notebook) --> using Python GridSearchCV and RandomizedSearchCV
- ML_Models_Performances_Lib.ipynb --> models implemented using Python sklearn lib
Requirement:
- Need to download ATP.csv kaggle dataset
Coming soon:
- package to be generalized for manaing any dataset by the means of a yaml configuration file for listing column names to be dropped ffrom the dataset priori any dataprocessing