This repository contains Jupyter notebooks and scripts for Exploratory Data Analysis (EDA) and prediction on lung cancer dataset. It also includes a Streamlit web application for easy interaction and prediction.
The dataset used is the Lung Cancer Prediction dataset, which contains information on patients with lung cancer, including their age, gender, air pollution exposure, alcohol use, dust allergy, occupational hazards, genetic risk, chronic lung disease, balanced diet, obesity, smoking, passive smoker, chest pain, coughing of blood, fatigue, weight loss, shortness of breath, wheezing, swallowing difficulty, clubbing of finger nails and snoring.
The EDA notebook provides a comprehensive analysis of the dataset, including data cleaning, data visualization, and statistical analysis. It helps to understand the factors contributing to lung cancer and their relationships.
The prediction script includes machine learning models to predict the risk of lung cancer based on the provided features. It also provides an explanation of the models used and their performance.
The Streamlit web application allows users to interact with the dataset and the models in a user-friendly interface. Users can input their information and get a risk prediction for lung cancer.
To use this repository, clone it and run the Jupyter notebooks for EDA and prediction. To use the Streamlit web application, navigate to the application directory and run the Streamlit command.
Contributions to this repository are welcome. If you have any suggestions or improvements, please submit a pull request.
This project is licensed under the terms of the MIT license.