Code Monkey home page Code Monkey logo

love_and_basketball's Introduction

A Winning Formula

Note: This repo is still (very much) under construction! Optimizations for some of the clustering methods used in this study can be found in jupyter notebooks in the src directory, along with a number of python scripts used to pull data.

Table of Contents

  1. Abstract
  2. Data
  3. Clustering

Abstract

There’s long been a simmering belief that basketball’s traditional positional labels (point guard, shooting guard, small forward, power forward, center) are insufficient in fully describing a player’s role and activity on the court. Today, more than ever, the issue is bubbling over, and we see players like Giannis Antetokounmpo, Nikola Jokic, et al. bending the rules ascribed to players of their position by traditionalists - we need to adapt our viewing/analyzing paradigms accordingly. Muthu Alagappan famously made an effort to reposition NBA players at Sloan 2012, but his analysis falls short for me. The main issue I find is that a number of his positions fail to impart any sense of how the players of that position actually play on the floor, the most egregious among them, ‘role players’, being a position comprised of roughly the same number of players as about five other positions combined. In all, a rough visual estimate tells me that, in this analysis, roughly one-third of all NBA players wind up in this ‘role player’ position; this is problematic, to say the least.

My aim in this study was initially in two parts - first, redefine player positions using advanced statistics and unsupervised machine learning (namely clustering), second, use these new positions for further analysis and predictive modeling. Data constraints and other factors led to my inability to truly pursue the latter, but this ended up for the best. I pivoted to a clustering-based, purely exploratory look at my data, learning a good deal about a variety clustering algorithms, and, most importantly, about basketball analysis.

Data

Most of my data was pulled from Basketball Reference - the site has a handy 'Get Table as CSV' option which made pulling the data a snap: basketball reference

I pulled not only per 36 minutes stats, but also advanced stats for each qualified player-season over the last 3 years. A player-season was qualified if the player played at least 50 games and at least 10 minutes/game for that season. Both the playing time constraint and the 3 season constraint were fairly restrictive, but ultimately necessary. The playing time minimum was needed to limit noise, while the 3 season limit was intended to ensure my models would capture the positions of today's game.

To bolster my data, I also used ProBasketball API to get data for a number of tracking statistics - total touches per game, distance covered per game, etc.

My final data spanned 911 player-seasons and 53 statistics.

Clustering

Clustering is a messy matter to begin with - in this study, its almost enigmatic nature was only exacerbated regardless of model selection and parameter tuning. Our model is built around NBA players, almost none of whom are abjectly terrible at any one aspect of the game. We would expect players from cluster 1 to be at least serviceable when it comes to the skills that players from cluster 2 thrive at, etc. This fact alone made it hard to quantify success in my clustering, as we would almost expect something like mean silhouette score to have a fairly low cap.

K-Unspecified

DBSCAN

BIRCH

Affinity Propagation

K-Specified

K-Means

Gaussian Mixture Model

Agglomerative Clustering

love_and_basketball's People

Contributors

tilla232 avatar

Stargazers

Nick Loesch avatar

Watchers

James Cloos avatar

Forkers

amandabailey

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.