This project aims to develop a machine learning model for predicting diabetes based on certain features. We will be using Python along with the Pandas and Scikit-learn libraries.
The dataset used for this project is the Pima Indians Diabetes Database available on Kaggle. It contains several features such as age, BMI, blood pressure, and glucose levels, along with the target variable indicating the presence or absence of diabetes.
- create a python virtual environment( virtualenv or anaconda ):
- Install Python and required libraries:
scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language.[3] It features various classification, regression and clustering algorithms including support-vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Scikit-learn is a NumFOCUS fiscally sponsored project.
- python
- Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation via the off-side rule. Python is dynamically typed and garbage-collected
- pandas
- pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license.