- Created columns for data in csv file.
- Loaded data from LoanStats csv file.
- Cleaned data.
- Encoded dates as integers using LabelEncoder from sklearn.
- Encoded string data as binary data using get_dummies.
- Split data Training and Testing.
- Scaled the training and testing data using StandardScaler from sklearn.
- Naive Random Oversampling: a. Implemented random oversampling with RandomOverSampler from imblearn. b. Trained the logistic regression model with LogisticRegression from sklearn.linear_model c. Fit the model using the resampled X and y data. d. Displayed the Confusion Matrix using confusion_matrix from sklearn.metrics e. Calculated the balanced accuracy score using balanced_accuracy_score from sklearn.metrics f. Printed the imbalanced classification report using classification_report_imbalanced from imblearn.metrics
- SMOTE Oversampling: a. Resampled the training data using fit_resample and SMOTE from imblearn.over_sampling. b. Trained the Logistic Regression model with LogisticRegression from sklearn.linear_model c. Fit the model using the resampled X and y data. d. Displayed the confusion matrix e. Calculated the balanced accuracy score f. Printed the imbalanced classification report
- Undersampling with Cluster Centroids: a. Resampled the training data using fit_resample and ClusterCentroids from imblearn.under_sampling. b. Trained the Logisti Regression model c. Fit the model using the resampled X and y data. d. Displayed the confusion matrix for the test adn prediction data e. Calculated the balanced accuracy score for the test and prediction data f. Printed the imbalanced classification report for the test and prediction data
- Combination (Over and Under) Sampling with SMOTEENN a. Resampled the training data using fit_resample and SMOTEENN from imblearn.combine b. Trained the Logistic Regression model c. Fit the model using the resampled X and y data. d. Displayed the confusion matrix for the test and prediction data e. Calculated the balanced accuracy score for the test and prediction data f. Printed the imbalanced classification report for the test and prediction data.
- FINDINGS:
The Combination (Over and Under) Sampling with SMOTEENN Model
The Naive Random Oversampling Model
The Combination (Over and Under) Sampling with SMOTEENN Model
- Created columns for the data in the csv file
- Loaded data from the LoanStats csv file
- Cleaned data
- Encoded dates as integers using LabelEncoder from sklearn.preprocessing
- Encoded string data as binary data using get_dummies
- Split data into Training and Testing
- Scaled the training and testing data using StandardScaler from sklearn.preprocessing
- Balanced Random Forest Classifier a. Created the balanced random forest model using BalancedRandomForestClassifier from imblearn.ensemble b. Fit the model using the scaled X training data and the y training data c. Made predictions using the scaled X testing data d. Displayed the confusin matrix as a DataFrame e. Calculated the balanced accuracy score f. Printed the imbalanced classification report for the y testing and prediction data g. Calculated feature importance using feature_importances_ h. Sorted features by their importance and displayed as a list
- Easy Ensemble Classifier a. Created the Easy Ensemble Classifier model using EasyEnsembleClassifier from imblearn.ensemble b. Fit the model using the scaled X training data and the y training data c. Made predictions using the scaled X testing data d. Displayed the confusion matrix as a DataFrame for the y testing and prediction data e. Calculated the balanced accuracy score for the y testing and prediction data f. Printed the imbalanced classification report for the y testing and prediction data
- FINDINGS:
The Easy Ensemble Classifier Model
The Easy Ensemble Classifier Model
The Easy Ensemble Classifier Model
1 total_rec_prncp, 2 last_pymnt_amnt, 3 total_rec_int