Code Monkey home page Code Monkey logo

bhuvaneshravi / disease-prediction-model Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 1.0 2.38 MB

A prediction model that uses genetic data for disease classification. Data is extracted from a DNA microarray which measures the expression levels of large numbers of genes simultaneously.

Jupyter Notebook 93.97% Python 6.03%
machine-learning scikit-learn prediction-model python3 gaussian-naive-bayes knn-classifier extra-trees-classifier decision-tree-classifier

disease-prediction-model's Introduction

disease-prediction-model

A prediction model that uses genetic data for disease classification.

Objective

The goal of this project is to build a model hat can classify a disease using machine learning classifier.

Dataset

Data is extracted from a DNA microarray which measures the expression levels of large numbers of genes simultaneously. Samples in the datasets represent patients. For each patient 7070 genes expressions (values) are measured in order to classify the patient’s disease into one of the following cases: EPD, JPA, MED, MGL, RHB.

Data Preprocessing:

The data set was split into training and testing files ‘pp5i_train.gr.csv’ and ‘pp5i_test.gr.csv’ which was kept locally and loaded into the python source code using the pandas library. The library allows us to do data manipulations and analysis, used specially for reading data from csv files. Then the data is labelled and the fold difference is limited between 2 and 16000. The subset of top gene samples are extracted in the sets of 2,4,6,8,10,12,15,20,25, and 30 top genes with the highest absolute T-value.

Implementation in Python:

Prediction model is developed using different algorithms - Gaussian Naïve Bayes Classifier, K Nearest Neighbor Classifier, Extra Tree Classifier, Neural Network - Multi-Layer Perceptron and Decision Tree Classifier. The conceptual idea behind this classifier is to pick an algorithm and a particlaur subset of genepool so that we can make tweaks to it with various regularization schemes, this process improves the learning ability of the model in a gradual and additive fashion.

Accuracy:

The Extra Tree classifier is particularly effective at classifying this particular gene sample with optimal k value 25. The classification is accurate ~95% of the time (this is validated against a pre-defined validation dataset fed into the model during the train/test phase).

Visualisation Model Performance and Validation:

Alt text

Alt text

Results/Inference:

In this project, we developed and compared several machine learning classifiers for predicting disease using dataset collected from gene microarray. The classifiers are trained in the labelled training gene samples and predicted on the provided unlabeled test sample. The most efficient classifier among them was identified as Extra Tree Classifier with best accuracy rate. Based on the proposed classification model, the disease prediction can be done for any sample collected over the microarray and the patient can be diagnosed in a most efficient manner.

Please refer to the research paper(unpublished) Disease Prediction on Genetic Microarray Data.pdf for futher explanation.

Contributions:

  1. Bhuvaneshwaran Ravi
  2. Serlin Tamilselvam
  3. Kameswaran Rangasamy
  4. Jayashree Srinivasan

disease-prediction-model's People

Contributors

bhuvaneshravi avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

renliao

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.