Code Monkey home page Code Monkey logo

classification-with-health-records's Introduction

Classification-with-Health-Records

Dataset Description This dataset includes information about over 100,000 medical appointments of different patients from different neighborhoods in Brazil, and this dataset discuss very important point that why a person makes a doctor appointment, receives all the instructions and no-show.

Part I: Data prep:

    This includes:
  • Reading in data csv file
  • Cleanup data column names
  • Removed records with erroneous entries (e.g., negative ages, look at what people have done in Kaggle)
  • Created a test set of 20k records that you won’t touch again for the reminder of this project until Part III. Use stratified sampling on the No-show variable to ensure test set and training set class proportions are the same. Save the train and test sets as csv files in the processed_data directory.
  • Plotted the No-show variable against the other variables in the dataset as part of Exploratory Data Analysis
  • Created a preprocessing pipeline using scikit to prepare the data for the ML algorithms we will use. At a minimum, standardize numerical variables, transform categorical variables into one or more numerical values. You may apply other transformations that you think would be useful (e.g., logarithmic transformations).
  • Part II: Classification Methods

      Here are the steps involved in this part
  • Using sklearn fit a DecisionTree, a RandomForest, a linear SVM and an SVM with a radial basis kernel to the transformed data. For now, use default parameters for each method.
  • Use 10 fold cross validation to estimate performance of each of the above methods using both accuracy and AUC as metrics.
  • Based on the above choose two of the ML methods and fit a model using 5 fold cross validation for model selection and 10 fold cross validation for model assessment.
  • Implemented gradient descent for a linear svm and test it on the training set.
  • Part III: Ensembles

      Here are the steps involved in this part
  • Trained an AdaBoost classifier and compare its performance to the results obtained in Part II using 10 fold cross validation as before
  • Trained an xgBoost classifier and compare its performance to the results obtained in Part II
  • Chose a set of 5 or so classifiers, e.g., Decision Trees of diverse depths, linear SVMs over diverse subsets of features, RBF kernels with diverse bandwidths, Random Forests with diverse number of trees in their ensemble, be creative!. Write a function that given a training set does the following:
  •     Created a validation set using 20% of the training set
        Trained each of your chosen classifiers on the training set
        Using the validation set created a new dataset where features are predictions made by each of your chosen classifiers
        Trained a logistic regression classifier to blend the predictions
    

    classification-with-health-records's People

    Contributors

    huacenxu avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.