- Data sources
- Data cleaning
- Data exploration and analysis
- Time series analysis
- Machine learning
- Conclusion
The data is accquired from City of Seattle Open Data portal
And also available through API Dev.Socrata
Start with initial exploring and having a general feel for the data
Let's start by asking few questions
- What are the features
- What are the expected types (int, float, string, boolean)?
- Is there obvious missing data ?
- Is there other types of missing data that’s not so obvious ?
Things to dealing with
- Standard missing values (Pandas can detect them)
- Non-standard missing values (different formats)
- Unexpected missing values (can be mix of above two or totally different ones) They can be dealth with removing or replacing or doing some conversions or some combination of mentioned.
And finally summarizing any missing values.
The final processed or refined data info looks like this.
Additional things done for refinement
- Converted strings to categorical
- Converted time to datetime format
- Taking mean for floating numbers if missing or inconsistent
- If not categorizable or fewer data points, clubbed into seperate category
Data cleaning notebook provides information on what has been done to refine the data.
Visualizing features
-
Officer Squad desc
There is one category which spikes, let's get top 50 categories. -
Top 50 Officer squad
TRAINING - FILED TRAINING SQUAD is one category which has been dealing with crisis. -
Officer precinct desc
Most of the occurrencies happened in SOUTH, WEST, EAST, NORTH, SOUTHWEST PCT -
Officer bureau desc
OPERATIONS BUREAU is the main bureau dealing with crisis. -
Officer precinct vs Officer bureau
Observed EAST, NORTH, SOUTH, SOUTHWEST, WEST PCT are the precincts where OPERATIONS BUREAU are the ones dealing with crisis. -
Initial call type
Top 2 are related to Suicide and emotional crisis. -
Final call type
Since this data is related to CRISIS, we see the final call type to be. -
Officer year of birth vs Officer gender
Majority is less than 4 years of experience. -
Officer year of birth vs Officer gender
i) Most of them are from 1977 - 1992.
ii) From 1985-1990 there is an uptick of hires, but female officers were hired less.
iii) From 1985 there is an upward trend of hiring more female officers. -
Officer race vs Officer gender
Most female officers are white.
There are 952 unique officers in this data set.
Instead of dealing with all the cases, let's deal with officers who handled more than 100 cases.
Extracting the data and doing analysis on the data.
-
Reported time and Occurred time
There are no reporting before 05-15-2016, so removed date before 05-15-2016. -
Reported time and Occurred time after 05-15-2016
There are certain time frames where there are more Crisis occurrences. -
Distplot for time difference in days
Usually it's handled with in a day. -
Distplot for time difference in hours
Most of the time it is handled within 5 hours of the reported time. -
Reported vs Occurred time difference in days
From the above graphs we see those occurrences spiking up during early part of the year. -
Time difference between Reported and occurred (in seconds)
We see as the year goes on, the time difference also increases and we see early of the year the time difference reduces.
And observing patterns we see the crisis cases appear (spikes up) during the early part of a year, and that's where the crisis team are active and response time is less.
For further info about exploration and analysis refer Data exploration and analysis notebook and for time series refer Time Series notebook