Code Monkey home page Code Monkey logo

police_force_us's Introduction

Follow me at Twitter GitHub last commit

An Examination of Fatal Force by Police in the US

Project Status: Completed
Jupyter Notebook Viewer
Read Article

Table of contents

Project Objective

(Back to top)
The purpose of this project is to examine the factors that play into the horrible event of a fatal shooting by the police in the US. Which ones carry more weight that lead to fatal shootings and are perhaps predictive in nature? Are they race? State location? Mental Illness? Based on our findings from the dataset of the available variables, we are looking to predict the deceased’s race or mental illness status.

Methods Used

(Back to top)

  • Data Cleaning
  • Exploratory Data Analysis
  • Data Visualization
  • Machine Learning

Technologies:

(Back to top)

  • Python
  • Pandas
  • NumPy
  • re
  • Seaborn
  • Matplotlib
  • Copy
  • Geopandas
  • Folium
  • Geopy
  • Nominatim
  • Template
  • MacroElement
  • PrettyTable
  • Sklearn - preprocessing
  • Sklearn.linear_model - LogisticRegression
  • Sklearn.preprocessing - StandardScaler
  • Sklearn.metrics - accuracy_score
  • Sklearn.svm - SVC
  • Sklearn.tree - DecisionTreeClassifier
  • Sklearn.ensemble - RandomForestClassifier
  • Sklearn.model_selection - cross_val_score
  • Sklearn.model_selection - RandomizedSearchCV

Project Description:

(Back to top)

  • A dataset from the Washington Post was used, which had over 5700 data points and were collected between 2015-2020.
  • Cleaned the data by using pandas.
  • As far as feature engineering, 9 out of the total 17 variables type had to be transformed into different types.
  • To read more about the data cleaning process click here.

Project Results:

(Back to top)
With race being a big question going into this project, seeing the number of victims based on race seemed to be perfectly logical. Here we see that within all of the dataset White was killed the most and then black was almost half of the white percentage.

Following now in the same line of questioning, we took a look at the physical location by state with respect to the races that were shot in those states:

We saw that the majority of victims that are Hispanic have been shot by the police in the following States: Texas, New Mexico and California. This becomes evident when we pay attention to the purple dots. When looking at the dots with the pink colors. We can see that most of those are more centrally located in the country. We could possibly conclude that most of the Native Americans are in most danger in the central part of the country. As we can see with no surprise now, the majority of the victims are white. The green dots are spread all over the country. What is really interesting to point out is the majority of the yellow dots are on the East Coast. We see most of the dots on the right side of the country rather than the left side. This could be an indicator that black people are more in danger on the east coast when they interact with the police. We can definitely see fewer yellow markers on the west coast.

Next, we wanted to see if shootings might have some kind of relation based on state.

We can see that a large majority of the victims were shot in California, Texas, and Florida, which makes sense as they are the top three most populated states. Whereas, in some of the smaller states, it shows there are fewer victims. Therefore, we decided to calculate the per capita number of victims by dividing the fatal shooting count per state by the state population size and then multiply that by 100,000.

Surprisingly, our new top three results Alaska, New Mexico, and Oklahoma are relatively smaller states. It shows that there are more shootings on a relative basis in these states compared to the larger states, such as California. We create a heatmap that shows the level of concentration based on the shootings per 100,000 rate as seen in the previous bar graph.

Here, we can see that being in the mid-southwest around New Mexico and Oklahoma are not great places per capita for police shootings. Alaska with its size seems a possible outlier here.

In the plot shown below, we can see that the distribution of gender, race and signs of mental illness are unbalanced.

In the next plot we pay our attention to the feature "body cam footage" and check how that might determine whether an individual was shot. Below we can see that manny officers did have the camera off making this data skewed and hard to have as a determining variable.

Below we can see a box plot that represents the age distribution by race. The age distribution by race has a strong representation across all between the ages of 22 and 47, with a mean of 37. Please note, we replaced the 262 null values in the “age” column with the mean values based on race and gender.

In order to create a model to predict signs of Mental Illness, we used the following ML algorithms:

  • Logistic Regression
  • SVC (Support Vector Classification)
  • SGD (Stochastic Gradient Descent)
  • Decision Tree
  • Random Forest We train our model and get the following accuracy scores.

We picked the top 2 performing models from above and conducted a cross validation on them. Once we conducted the cross validation with a k-fold of 10 and scoring value of ‘accuracy’, we got the following results:

Clearly, we could see that our performing models were overfitting. We took the better performing model and started fine tuning it. The Random Forest performed better in comparison to the Decision Tree when cross validated. Next step is to tune the model. We use RandomizedSearchCV and tune the following parameters:
  • bootstrap: [True,False],
  • max_depth: [int(x) for x in np.linspace(start = 10, stop = 110, num =11)],
  • max_features: [“auto”,”sqrt”],
  • min_samples_split: [2,5,10],
  • min_samples_leaf: [1,2,4],
  • n_estimators: [int(x) for x in np.linspace(start = 200, stop = 2000, num =10)]

Our model improved from 0.724 to 0.772. Next we implement our model to the test set and we get an accuracy score of 0.771, which is the percentage of correctly predicted labels.

In regard to predicting the race, we concluded that we would need to gather further data in order to create an acceptable model to predict the race. For more details click here.

Please click here for final conclusion.

Installation:

(Back to top)

police_force_us's People

Contributors

navido89 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.