Code Monkey home page Code Monkey logo

just_stack_them_up's Introduction

Just_Stack_Them_Up

A python implementation to add models using the stacking methodology. Highly influenced by a post from MLWave http://mlwave.com/kaggle-ensembling-guide/ and https://github.com/log0/vertebral/blob/master/stacked_generalization.py

It is just a py script. Not perfect in any sense.See the source for more options. Change according to need and ask for merge requests. New to this so if did anything wrong. Create an issue. Enjoy!!!

Usage


base_classifiers = [[RandomForestClassifier(n_estimators= 2, criterion = 'entropy'),"RF_ENTROPY"],
[RandomForestClassifier(n_estimators = 2, criterion = 'gini'),"RF_GINI"]],

blender_Classifiers =[[RandomForestClassifier(n_estimators = 2, criterion = 'entropy'),"BLEND_RF_ENTROPY"],
[RandomForestClassifier(n_estimators = 2, criterion = 'gini'),"BLEND_RF_GINI"]]

#Create a stacker obj

stack1 = stacker(x_test = X_test,x_train = X_train,y_train =y_train,id_test=id_test, 
base_clf_list=base_classifiers,blender_clf_list =blender_Classifiers)

Here the id_test is the ID column of a test dataset that will be required.

You can train all base classifiers and create blended_train and test dataset using:


stack1.train_all_base_classifiers()

#OUTPUT


Training classifier [0] [RF_ENTROPY]
Fold [0]
auc_score for fold: 0.601556465779
Fold [1]
auc_score for fold: 0.604232041248
Fold [2]
auc_score for fold: 0.584394983439
Fold [3]
auc_score for fold: 0.603993230476
Fold [4]
auc_score for fold: 0.59159561713
cv_score_mean: 0.597154467614 and cv_score_std: 0.00787329116686
Training classifier [1] [RF_GINI]
Fold [0]
auc_score for fold: 0.590317422147
Fold [1]
auc_score for fold: 0.604126251307
Fold [2]
auc_score for fold: 0.598296676694
Fold [3]
auc_score for fold: 0.593156671037
Fold [4]
auc_score for fold: 0.579752596971
cv_score_mean: 0.593129923631 and cv_score_std: 0.00817897680863

Or add individual Base Learners.


stack1.add_base_classifer([RandomForestClassifier(n_estimators = 3, criterion = 'gini'),"RF_GINI"])

Training classifier [1] [RF_GINI]
Fold [0]
auc_score for fold: 0.598224708298
Fold [1]
auc_score for fold: 0.592026100312
Fold [2]
auc_score for fold: 0.582395075357
Fold [3]
auc_score for fold: 0.593593668134
Fold [4]
auc_score for fold: 0.592920852134
cv_score_mean: 0.591832080847 and cv_score_std: 0.005181679546

Train All blenders using:


stack1.train_all_blenders()

Find Cross Validation AUC score of all Blenders using:


stack1.find_cv_scores_all_blenders()

blender_Name: BLEND_RF_ENTROPY :
Fold 1 CV Score: 0.602361151841
Fold 2 CV Score: 0.625628341057
Fold 3 CV Score: 0.596159573553
Fold 4 CV Score: 0.601513229218
Fold 5 CV Score: 0.596663814886
cv_score_mean: 0.604465222111 and cv_score_std: 0.0108707380607
blender_Name: BLEND_RF_GINI :
Fold 1 CV Score: 0.597490833245
Fold 2 CV Score: 0.614910113814
Fold 3 CV Score: 0.595923008772
Fold 4 CV Score: 0.604254460162
Fold 5 CV Score: 0.591763066213
cv_score_mean: 0.600868296441 and cv_score_std: 0.00809205888706

Print the object at anytime to see whats up with it.


print(stack1)

init:
n_folds: 5 random_seed:0
base classifiers:
1. [RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=2, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'RF_ENTROPY'] 
fold_1_auc:0.601556465779
fold_2_auc:0.604232041248
fold_3_auc:0.584394983439
fold_4_auc:0.603993230476
fold_5_auc:0.59159561713
2. [RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=2, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'RF_GINI'] 
fold_1_auc:0.590317422147
fold_2_auc:0.604126251307
fold_3_auc:0.598296676694
fold_4_auc:0.593156671037
fold_5_auc:0.579752596971
Blender classifiers:
1. [RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=2, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'BLEND_RF_ENTROPY'] 
fold_1_auc:0.602361151841
fold_2_auc:0.625628341057
fold_3_auc:0.596159573553
fold_4_auc:0.601513229218
fold_5_auc:0.596663814886
2. [RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=2, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'BLEND_RF_GINI'] 
fold_1_auc:0.597490833245
fold_2_auc:0.614910113814
fold_3_auc:0.595923008772
fold_4_auc:0.604254460162
fold_5_auc:0.591763066213

Create final Kaggle Submission:

You can also create weighted submission of all your blenders.See the source for params.


stack1.get_weighted_blender_submission(submission_name= "unw_ble.csv")

just_stack_them_up's People

Contributors

mlwhiz avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

payalbhatia

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.