The goal of this project is to clean the data in a dataset, analyse it and generate a descriptive summary. Also we have done some data exploration with the dataset. The cleaned data is used to generate a descriptive summary for the features and contents. The dataset used (NYPD Shooting Incidents Dataset) includes all shooting crimes reported to the New York City Police Department (NYPD) from 2006 to the end of 2019.
Demo application of the crime predictor: https://nypd-crime-predictor.herokuapp.com/
Data cleaning is an important step in any Data science project. In this step, we will clean the dataset by applying following data cleaning methods
- Removing any duplicates present in the dataset.
- Converting the object data to categorical data to perform encoding in later stages.
- Dealing with missing values.
- Text cleaning of categorical feature names.
- Keeping only sensible Age ranges.
- Convert date and time to datatime64 datatype and split into different columns.
- Encoding categorical names with numerics.
- Removing any unnecessary columns.