Code Monkey home page Code Monkey logo

nfl_play_predictor's Introduction

NFL Play Predictor

A binary classifier - Rush vs. Pass

Author: Philip Ramirez Date: January 2017

Motivation

Most work done in sports analytics is related to game scores, outcomes, or player performance forecasting, but modeling game time decisions is a complex - yet potentially fruitful - task. Football is a stop and start game of difficult decisions based on specific game circumstances; It is especially advantageous for coaches and offensive coordinators to be as unpredictable as possible. From the perspective of opposing teams, coaches, and defensive coordinators having a reliable, data driven prediction of the next play would be invaluable. Furthermore, knowing which salient features determine game time decisions can ultimately influence defensive preparation as well as in-game strategy.

Data

There were three major steps involved in acquiring my final dataset.

  • The bulk of my data was sourced by NFLSavant.com. It includes full play by play - including playoffs and Super Bowls - of season years 2013 through 1016.
  • I then supplemented this data with scraped features from NFLWeather.com and Pro-Football-Reference.com. Some of these features included temperature, humidity, visibility, coaches, defensive alignments, offensive schemes ...
  • The test set (data from season year 2017) was scraped from Pro-Football-Reference.com

The cleaned training data set contains a total of around 120,000 offensive plays ranging from season years 2013 - 2016. This data was split into a training and validation sets in order to tune hyper parameters. Finally, the plays run in 2017 were used as the final test set for model evaluation. After feature engineering and the elimination of redundant features, the final feature set includes 31 columns. Categorical variables were then dummied.

Model

The model features fall into 7 major categories: The target is pass. Every pass play attempt is given a value of 1. Every rush attempt is given a value of 0. Three classifiers were attempted and evaluated. Accuracies are supplied with each model:

Model Value Precision Recall
Logistic Regression 0.71 0.71 0.71
Random Forest classifier 0.73 0.73 0.73
GradientBoostingClassifier 0.74 0.74 0.74

Evaluation

I've evaluated the Gradient Boosting Classifier's performance against a baseline naive model that always selects pass (the majority class).

Metric Value
baseline 0.58
accuracy 0.74
precision 0.74
recall 0.74
f1-score 0.74

When compared to the baseline accuracy of 58%, the Gradient Boosting Classifier is 30% more accurate.

Insights

One of the advantages of using tree based models are identifiable feature importances. After model evaluation, the fundamentals of football proved to be the most important features.

Formation

70% of plays run from the shotgun formation are pass plays. Plays run from under center tend to be more evenly distributed.

Down and Distance

The down and the amount of yards until the next first down are both positively correlated to pass attempts. In the last 5 years, plays run on third down -with at least 3 yards to go- are 80% pass attempts.

Time left in half

One of the most important features is how much time is left in each half of a given game. Quarters 1 and 3 tend to be conservative (teams run the ball more), whereas quarters 2 and 4 are pass heavy. The most interesting part of this graphic is the observable dip of pass plays right after the start of overtime. The very conservative play calling in this time period indicates that teams are trying not to lose!

Tech Used

Python

  • Pandas and Numpy for data processing and feature engineering
  • SciKit Learn Logistic Regression, Random Forest Classifier, Gradient Boosting Classifier, and Grid Search Database
  • MongoDB for data storage/querying AWS
  • iPython clusters of up to 10 EC2 instances to run grid searches of evaluated models

Future Work

  • Player packages for each play
  • Baseball - another stop and start sports with complex in game decisions

nfl_play_predictor's People

Contributors

philiprami avatar

Stargazers

Michael Dunbar avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.