Code Monkey home page Code Monkey logo

nfl_densord's Introduction

Projecting non-parametric NFL player performances from ordinal data

Abstract

Statisticians revolutionized professional baseball and basketball at all levels of team operations. However, effectively modeling players’ performances in the National Football League remains elusive due to three common pitfalls: (1) football games contain many variables, (2) datasets are small, and (3) patterns are non-parametric. I developed a supervised learning approach, which seeks to overcome these challenges. My method translates publicly available expert rankings into probability distributions, predicting each player’s likely performance in the upcoming week. The distributions are fit using kernel density estimation, followed by a series of processing steps to reduce overfitting and noise. The results below focus on "fantasy points" as the metric of player performance, although the method can be adjusted to predict other dependent variables.

The present repo contains the code used for this Python project. Full details on the project are provided in this white paper. This readme illustrates the primary results (the distributions) and later gives a summary of the repo's code.

Results

The figure just below provides the final distributions for each position.

Notice that the distributions are oftentimes not smooth, with slight humps and plateaus. These humps and plateaus are not noise, as smoothing them out lowers model fit on testing data. These patterns arise due to discrete aspects of football performance. For example, each touchdown is a discrete outcome worth six fantasy points. See the wide receiver (WR) plot in the top right. There is first a plateau from 8-13 points and then a second plateau from 17-21 points. Wide receiver performances at the second plateau were likely associated with one more touchdown that performances at the first plateau. A similar trend is found for tight ends (TE).

This second figure provides the cumulative distributions for each position.

Code organization

Scraping & Organizing data

The code in directory scrape_prepare_input retrieves historic ranking and performance data. Most notably, it retrieves every FantasyPros expert's ranking of every player across every week of every NFL season from 2013-2021. Roughly, this corresponds to 60,000 player-weeks and around ten million rankings. All data scraped are public. These files also organize the data into a Pandas Dataframe. I prepared documentation for the most pivotal pieces of code, including:

  • organize.organize_input.py
  • scrape.scrape_fantasypros.py
  • scrape.scrape_scores.py

Creating the distributions

Based on the scraped data, the code in directory make_distributions creates probability distributions, predicting player performance based on their expert ranking. These are described in detail in the white paper. I prepared documentation for the most pivotal pieces of code, including:

  • setup_expert_distributions.py
  • density.py
  • rank_concat.py
  • smooth.py
  • test_accuracy.py
  • plot.py

General pipeline

setup_expert_distributions.py runs the pipeline, it calls all the functions below.

Creating a distribution involves first scraping historic expert ranking data using scrape.scrape_fantasypros.py and scraping historic fantasy points performance data using scrape.scrape_scores.py. The scraped data are then organized into a single Pandas dataframe using scrape.organize_input.py. Each row of the dataframe represents one player's performance in one week, and their expert projections for that week.

After the data are loaded, the dataframes data are further organized into numpy arrays. Then, functions from density.py are called to create the density distributions. In creating the distributions, density.py calls functions from rank_concat.py to carry out the concatenation procedures described in the white paper. The distributions are then smoothed using functions from smooth.py. Next, the accuracy of the distributions is tested using cross-validation and functions from test_accuracy.py. Testing accuracy is used to tune various hyperparameters described in the white paper. Finally, the distributions are plotted using functions from plot.py.

Note that I haven't uploaded my scraped data to the repo, nor the directory I use as a cache.

Monte Carlo Simulation

Sampling from these distributions allows modeling the performance of player groups. The white paper goes into brief detail about this and its applicability to Daily Fantasy Sports.

However, the present repository only contains code for generating the distributions and does not contain any code for Monte Carlo simulations. I have that code in a private repository. If you are interested in discussing this, please email me ([email protected]). Building an efficient large simulation is difficult, so I encourage you to email me if that is your goal.

I am also open to other emails if you are interested in sports analytic consulting.

nfl_densord's People

Contributors

paulcbogdan avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.