Code Monkey home page Code Monkey logo

seattle_crisis_data_analysis's Introduction

Seattle_Crisis_Data_Analysis

Table of Contents

Data sources

The data is accquired from City of Seattle Open Data portal

And also available through API Dev.Socrata

Data cleaning

Start with initial exploring and having a general feel for the data

Let's start by asking few questions

  • What are the features
  • What are the expected types (int, float, string, boolean)?
  • Is there obvious missing data ?
  • Is there other types of missing data that’s not so obvious ?

results_df_info

Things to dealing with

  • Standard missing values (Pandas can detect them)
  • Non-standard missing values (different formats)
  • Unexpected missing values (can be mix of above two or totally different ones) They can be dealth with removing or replacing or doing some conversions or some combination of mentioned.

And finally summarizing any missing values.

The final processed or refined data info looks like this.
results_df_info2

Additional things done for refinement

  • Converted strings to categorical
  • Converted time to datetime format
  • Taking mean for floating numbers if missing or inconsistent
  • If not categorizable or fewer data points, clubbed into seperate category

Data cleaning notebook provides information on what has been done to refine the data.

Data exploration and analysis

Visualizing features

  • Beat
    beat

  • Sector and Precinct
    sector_precinct

  • Sector vs Precinct (combining above graphs)
    sector_v_precinct

  • Sector vs Office race
    sector_v_race

  • Sector vs Officer gender count
    sector_v_officer_gender

  • Officer Squad desc
    officer_squad_desc
    There is one category which spikes, let's get top 50 categories.

  • Top 50 Officer squad
    officer_squad_desc_top50
    TRAINING - FILED TRAINING SQUAD is one category which has been dealing with crisis.

  • Officer precinct desc
    officer_precinct_desc
    Most of the occurrencies happened in SOUTH, WEST, EAST, NORTH, SOUTHWEST PCT

  • Officer precinct vs Officer squad
    officer_precinct_desc_v_officer_precinct_desc

  • Officer bureau desc
    officer_bureau_desc
    OPERATIONS BUREAU is the main bureau dealing with crisis.

  • Officer precinct vs Officer bureau
    officer_precinct_desc_v_officer_bureau_desc
    Observed EAST, NORTH, SOUTH, SOUTHWEST, WEST PCT are the precincts where OPERATIONS BUREAU are the ones dealing with crisis.

  • Officer years of experience and year of birth
    officer_year_of_exp_year_of_birth

  • Officer race
    officer_race
    Majority are White officers.

  • Initial call type
    initial_call_type
    Top 2 are related to Suicide and emotional crisis.

  • Final call type
    final_call_type
    Since this data is related to CRISIS, we see the final call type to be.

  • Call type
    call_type
    Most are telephonic.

  • Officer year of birth vs Officer gender
    officer_year_of_exp_v_officer_gender
    Majority is less than 4 years of experience.

  • Officer year of birth vs Officer gender
    officer_year_of_birth_v_officer_gender
    i) Most of them are from 1977 - 1992.
    ii) From 1985-1990 there is an uptick of hires, but female officers were hired less.
    iii) From 1985 there is an upward trend of hiring more female officers.

  • Officer race vs Officer gender
    officer_race_v_officer_gender
    Most female officers are white.

  • Officer year of birth vs Use of force indicator
    officer_year_of_exp_v_force_ind

  • Officer year of birth vs use of force indicator officer_year_of_birth_v_force_ind

  • Officer race vs Use of force indicator
    officer_race_v_force_ind

  • Disposition count
    disposition

  • Disposition vs Initial call type
    disposition_v_inital_call_type

  • Disposition vs Final call type
    disposition_v_final_call_type

There are 952 unique officers in this data set.

Instead of dealing with all the cases, let's deal with officers who handled more than 100 cases.
Extracting the data and doing analysis on the data.

  • Call type
    call_type_top100

  • Final Call type
    final_call_type_top100

  • Officer race
    officer_race_top100

  • Officer precinct vs Officer bureau
    officer_precinct_desc_v_officer_bureau_desc_top100

  • Officer squad desc
    officer_squad_desc_top100

  • Officer precinct desc
    officer_precinct_desc_top100

  • Sector vs Officer gender
    sector_v_officer_gender_top100

  • Sector vs Officer race
    sector_v_race_top100

  • Officer precinct
    officer_precinct_desc

  • Officer precinct vsTemplate
    officer_precinct_desc_df_template_grp_ge2

  • Officer year of experience greater than 30
    officer_year_of_exp_mean_ge30yrs

Time series analysis

  • Reported time and Occurred time
    ts_day_time
    There are no reporting before 05-15-2016, so removed date before 05-15-2016.

  • Reported time and Occurred time after 05-15-2016
    ts_day_time_after_05
    There are certain time frames where there are more Crisis occurrences.

  • Time difference percentile
    reported_minus_occurred_percentile

  • Time difference percentile after 05-15-2016
    reported_minus_occurred_percentile_after_05

  • Distplot for time difference in days
    ts_days_distplot
    Usually it's handled with in a day.

  • Distplot for time difference in hours
    ts_hours_displot
    Most of the time it is handled within 5 hours of the reported time.

  • Reported vs Occurred date time plot
    ts_reported_v_occurred_date_time_after_05

  • Reported vs Occurred time difference (15 minute rounding)
    ts_15min_rounding_after_05

  • Reported vs Occurred time difference in hours
    ts_rolling_hours_after_05

  • Reported vs Occurred time difference in days
    ts_rolling_day_time_after_05
    From the above graphs we see those occurrences spiking up during early part of the year.

  • Difference of hours
    ts_hours_diff_after_05

  • Difference of days
    ts_day_diff_after_05

  • Difference of hours
    ts_24_hours_diff_after_05

  • Difference of weekdays
    ts_weekday_diff_after_05

  • Time difference between Reported and occurred (in seconds)
    ts_reported_minus_occurred_date_time_diff
    We see as the year goes on, the time difference also increases and we see early of the year the time difference reduces.
    And observing patterns we see the crisis cases appear (spikes up) during the early part of a year, and that's where the crisis team are active and response time is less.

For further info about exploration and analysis refer Data exploration and analysis notebook and for time series refer Time Series notebook

Machine learning

Conclusion

seattle_crisis_data_analysis's People

Contributors

rahul7a avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.