This README provides an overview of an ML project for heart disease prediction, along with key information on its attributes and the machine learning model used for prediction. The dataset contains information which includes four databases: Cleveland, Hungary, Switzerland, and Long Beach V. It consists of 76 attributes, but all published experiments refer to using a subset of 14 of them. The primary target attribute is "target," which indicates the presence of heart disease in the patient. It is an integer value where 0 represents no disease and 1 represents the presence of disease.
- Language Used: Python.
- Libraries Used:
numpy
asnp
pandas
aspd
train_test_split
fromsklearn.model_selection
LogisticRegression
fromsklearn.linear_model
accuracy_score
fromsklearn.metrics
.
- Train-Test Split: 20% test and 80% train data.
- Model Used: Logistic Regression.
- Software Used: Google Colab, Jupyter Notebook.
- Pre-processed Dataset: Contains 303 rows and 13 columns, with 1 target column.
- Model Accuracy: 82%
The data used in this project was obtained from four different databases: Cleveland, Hungary, Switzerland, and Long Beach V. These databases contain a total of 76 attributes, including the predicted attribute, which is the presence of heart disease in patients. The target variable is binary, with 0 indicating no disease and 1 indicating the presence of heart disease.
The dataset underwent data pre-processing before model training. This pre-processing step involved cleaning the data, handling missing values, and ensuring data consistency. Additionally, feature selection was performed to use a subset of 14 relevant attributes for training and testing. This subset was chosen based on previous experiments.
The machine learning model selected for this project is logistic regression, a binary classification algorithm. The training data, consisting of the selected features and target variable, were used to train the logistic regression model. During training, the model learned the relationships between the input features and the presence or absence of heart disease.
After training, the logistic regression model was tested using a separate dataset to evaluate its performance. The testing dataset was split from the original dataset, ensuring that it was not used during the training phase. The model's predictions were compared to the actual target values in the testing dataset to assess its accuracy which came to about 82%.
The results of the model testing phase were analyzed to determine the accuracy of the logistic regression model in predicting heart disease. The accuracy metric provided insights into the model's performance, and it was found to be 82%. This analysis helped validate the effectiveness of the machine learning model in predicting heart disease based on the selected features.
The dataset used in this project dates back to 1988 and comprises data from four databases: Cleveland, Hungary, Switzerland, and Long Beach V. It contains 76 attributes, but our analysis focused on a subset of 14 attributes. The primary target attribute is "target," which indicates the presence of heart disease in patients. A value of 0 represents no disease, while 1 indicates the presence of disease.
Predicting heart disease is of paramount importance in healthcare. Early detection can lead to timely interventions and improved patient outcomes. This project serves as a valuable resource for building predictive models to assist medical professionals in identifying individuals at risk of heart disease.
Understanding the context of this dataset is crucial. Cardiovascular diseases are a leading cause of death worldwide. By leveraging machine learning to predict heart disease, we aim to contribute to early diagnosis and prevention.
For model training, users need to provide data in a structured format. The input data should include the following attributes:
age
: Age of the patient.sex
: Gender of the patient (0 = female, 1 = male).chest pain type
: Type of chest pain (encoded as categorical values).resting blood pressure
: Resting blood pressure of the patient.serum cholestoral
: Serum cholesterol level in mg/dl.fasting blood sugar
: Fasting blood sugar level > 120 mg/dl.resting electrocardiographic results
: Results of resting electrocardiogram (values 0, 1, 2).maximum heart rate achieved
: Maximum heart rate achieved during exercise.exercise induced angina
: Presence of exercise-induced angina.oldpeak
: ST depression induced by exercise relative to rest.slope of the peak exercise ST segment
: The slope of the peak exercise ST segment.number of major vessels
: Number of major vessels colored by fluoroscopy (0-3).thal
: Thalassemia type (0 = normal; 1 = fixed defect; 2 = reversible defect).
The model generates predictions in the form of binary values:
- 0: No heart disease detected.
- 1: Heart disease detected.
Thank you for visiting. Enjoy exploring the projects!