Just_Stack_Them_Up

A python implementation to add models using the stacking methodology. Highly influenced by a post from MLWave http://mlwave.com/kaggle-ensembling-guide/ and https://github.com/log0/vertebral/blob/master/stacked_generalization.py

It is just a py script. Not perfect in any sense.See the source for more options. Change according to need and ask for merge requests. New to this so if did anything wrong. Create an issue. Enjoy!!!

Usage


base_classifiers = [[RandomForestClassifier(n_estimators= 2, criterion = 'entropy'),"RF_ENTROPY"],
[RandomForestClassifier(n_estimators = 2, criterion = 'gini'),"RF_GINI"]],

blender_Classifiers =[[RandomForestClassifier(n_estimators = 2, criterion = 'entropy'),"BLEND_RF_ENTROPY"],
[RandomForestClassifier(n_estimators = 2, criterion = 'gini'),"BLEND_RF_GINI"]]

#Create a stacker obj

stack1 = stacker(x_test = X_test,x_train = X_train,y_train =y_train,id_test=id_test, 
base_clf_list=base_classifiers,blender_clf_list =blender_Classifiers)

Here the id_test is the ID column of a test dataset that will be required.

You can train all base classifiers and create blended_train and test dataset using:


stack1.train_all_base_classifiers()

#OUTPUT


Training classifier [0] [RF_ENTROPY]
Fold [0]
auc_score for fold: 0.601556465779
Fold [1]
auc_score for fold: 0.604232041248
Fold [2]
auc_score for fold: 0.584394983439
Fold [3]
auc_score for fold: 0.603993230476
Fold [4]
auc_score for fold: 0.59159561713
cv_score_mean: 0.597154467614 and cv_score_std: 0.00787329116686
Training classifier [1] [RF_GINI]
Fold [0]
auc_score for fold: 0.590317422147
Fold [1]
auc_score for fold: 0.604126251307
Fold [2]
auc_score for fold: 0.598296676694
Fold [3]
auc_score for fold: 0.593156671037
Fold [4]
auc_score for fold: 0.579752596971
cv_score_mean: 0.593129923631 and cv_score_std: 0.00817897680863

Or add individual Base Learners.


stack1.add_base_classifer([RandomForestClassifier(n_estimators = 3, criterion = 'gini'),"RF_GINI"])

Training classifier [1] [RF_GINI]
Fold [0]
auc_score for fold: 0.598224708298
Fold [1]
auc_score for fold: 0.592026100312
Fold [2]
auc_score for fold: 0.582395075357
Fold [3]
auc_score for fold: 0.593593668134
Fold [4]
auc_score for fold: 0.592920852134
cv_score_mean: 0.591832080847 and cv_score_std: 0.005181679546

Train All blenders using:


stack1.train_all_blenders()

Find Cross Validation AUC score of all Blenders using:


stack1.find_cv_scores_all_blenders()


blender_Name: BLEND_RF_ENTROPY :
Fold 1 CV Score: 0.602361151841
Fold 2 CV Score: 0.625628341057
Fold 3 CV Score: 0.596159573553
Fold 4 CV Score: 0.601513229218
Fold 5 CV Score: 0.596663814886
cv_score_mean: 0.604465222111 and cv_score_std: 0.0108707380607
blender_Name: BLEND_RF_GINI :
Fold 1 CV Score: 0.597490833245
Fold 2 CV Score: 0.614910113814
Fold 3 CV Score: 0.595923008772
Fold 4 CV Score: 0.604254460162
Fold 5 CV Score: 0.591763066213
cv_score_mean: 0.600868296441 and cv_score_std: 0.00809205888706

Print the object at anytime to see whats up with it.


print(stack1)


init:
n_folds: 5 random_seed:0
base classifiers:
1. [RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=2, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'RF_ENTROPY'] 
fold_1_auc:0.601556465779
fold_2_auc:0.604232041248
fold_3_auc:0.584394983439
fold_4_auc:0.603993230476
fold_5_auc:0.59159561713
2. [RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=2, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'RF_GINI'] 
fold_1_auc:0.590317422147
fold_2_auc:0.604126251307
fold_3_auc:0.598296676694
fold_4_auc:0.593156671037
fold_5_auc:0.579752596971
Blender classifiers:
1. [RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=2, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'BLEND_RF_ENTROPY'] 
fold_1_auc:0.602361151841
fold_2_auc:0.625628341057
fold_3_auc:0.596159573553
fold_4_auc:0.601513229218
fold_5_auc:0.596663814886
2. [RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=2, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'BLEND_RF_GINI'] 
fold_1_auc:0.597490833245
fold_2_auc:0.614910113814
fold_3_auc:0.595923008772
fold_4_auc:0.604254460162
fold_5_auc:0.591763066213

Create final Kaggle Submission:

You can also create weighted submission of all your blenders.See the source for params.


stack1.get_weighted_blender_submission(submission_name= "unw_ble.csv")

mlwhiz / just_stack_them_up Goto Github PK

just_stack_them_up's Introduction

Just_Stack_Them_Up

Usage

Or add individual Base Learners.

Train All blenders using:

Find Cross Validation AUC score of all Blenders using:

Print the object at anytime to see whats up with it.

Create final Kaggle Submission:

just_stack_them_up's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent