Code Monkey home page Code Monkey logo

mikel-ua / bigdata_analysis_breastcancer Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 12 KB

BigData Analysis - Breast Cancer Wisconsin Dataset (R, PCA, Machine Learning, ggplot2, dplyr): Exploring Kaggle's 'Breast Cancer Wisconsin (Original)' dataset. Objective: Develop a classification algorithm for benign/malignant tumor detection using PCA for dimensionality reduction and k-NN for classification. Achieved over 95% data retention.

big-data machine-learning pca-analysis python r k-nn dplyr ggplot2

bigdata_analysis_breastcancer's Introduction

BigData_Analysis_BreastCancerWisconsin_Dataset

Attribute selection and preparation of the "Breast Cancer Wisconsin (Original)" dataset for further analysis. Dataset: https://www.kaggle.com/buddhiniw/breast-cancer-prediction/data

Data analysis project using the "Breast Cancer Wisconsin (Original)" dataset, which was obtained from Kaggle. This dataset contains comprehensive information about various physical characteristics observed in cells potentially affected by breast cancer, collected between January 1989 and November 1991 by the University of Wisconsin Hospitals, Madison.

The primary objective of this project was to develop a classification algorithm capable of distinguishing between benign and malignant tumors in breast cancer cells. With the intention of early cancer detection, I used a supervised learning approach. Here's a summary of the project:

Data Exploration:

I began by exploring the dataset's structure and its attributes, which describe various physical characteristics of breast cancer cells. I identified data quality issues and addressed missing values.

Dimensionality Reduction:

Given the dataset's high dimensionality, I performed Principal Component Analysis (PCA) to reduce the number of variables while preserving essential information. This reduction enhanced the efficiency of subsequent modeling.

Data Visualization:

I utilized libraries such as ggplot2, corrplot, and GGally to create informative visualizations, revealing patterns and correlations within the data.

Machine Learning:

The project involved employing machine learning techniques, specifically k-Nearest Neighbors (k-NN) with cross-validation, to classify breast cancer cases as benign or malignant. I evaluated model performance and analyzed classification accuracy.

Result:

The analysis resulted in a refined dataset with reduced dimensionality while maintaining over 95% of the original data's variance. This allowed for effective modeling while mitigating the risk of overfitting. The project provided valuable insights into breast cancer diagnosis and improved classification accuracy.

bigdata_analysis_breastcancer's People

Contributors

mikel-ua avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.