dsscollection / basketball Goto Github PK

License: MIT License

R 79.41% TeX 20.59%

basketball's Introduction

Repository for dsscollection submission "Modeling Offensive Player Movement in Professional Basketball" by Steven Wu and Luke Bornn.

Below are descriptions of the subdirectories of this repo:

analysis: contains all code and materials required to create the PDF of the paper submission. To create the PDF, you will need knitr, dplyr, ggplot2, raster, grid, gridExtra installed. Then, either (1) open 'article.Rnw' in RStudio and click 'File -> Preview' or (2) open up a Terminal and type

Rscript -e "library(knitr); knit('./EPV_demo.Rnw')".

data: contains .Rdata files of intermediate data that is helpful for testing changes to the implementation of functions
plots: contains static plots that the article uses, namely plots that were generated over a season's worth of data

We would like to acknowledge STATS, LLC for consenting the inclusion of a full-game data sample.

basketball's People

Contributors

Stargazers

Watchers

Forkers

lucaswu17

basketball's Issues

review of first draft

I really like what you are trying to do here. Your work on this topic has been really well-received, and I think a lot of people will appreciate the kind of nuts-and-bolts how-to that you present here. The concept of starting with the raw data, discussing some of the wrangling challenges, briefly discussing previously-published models, creating interesting visualizations, and discussing results, is perfect. I do have comments on the implementation. :)

Since you have included so much code in the paper (roughly half of the content) you should think carefully about what you show here and what you hope to accomplish. I think there is too much code. While the functions are well-documented, many of them are too long and not that interesting. The code itself is also written in a non-R style with lots of for loops and indexing. I wonder whether much of this code could be greatly shortened by using a dplyr approach. For example, the first function you present is add_possession_columns_to_moments(), and this occupies more than a full page. Can't this be written as a single call to mutate() with some ifelse() statements? If all you are doing is defining two new columns based on the corresponding values of existing columns, then I think this should work. Similarly, in get_moments_w_possession_cols() I wonder if an appeal to dplyr::do() would be a conceptually simpler way to go here.

This raises the larger question of what you hope to accomplish by including the code for these functions. If the goal is simply to be explicit about exactly what you have done, then you have certainly achieved this. However, I don't know how many people will want to sit down and read this code. You might reach a wider audience by putting the code in an R package, documenting the functions there, and then using the space you free up to illustrate how the functions work and how another analyst might use them. You would also have more room for discussing interesting findings. [A sportVU package for R would be pretty exciting!]

I'll leave it to the @jennybc and @hadley to decide whether adding two derivative columns to a data frame is an operation that will be interesting to readers. As a data scientist, I think it's cool, but then it invites me to examine your code and think that it could be done more simply. As a basketball fan, it's not really that interesting.
I like the way that you included the first row of data at the beginning, but the way that this prints out in R makes it very hard to read. I'd suggest using a table with xtable. Since the variables relating to the 10 players are all the same, you could condense this by just showing the non-player related variables, and one set of the four variables describing the i-th player.
I would suggest focusing more on the interpretable results. I found the discussion at the end about the differences in the acceleration patterns of LeBron, Drummond, and Curry to be the most compelling part of the paper. In my opinion this is the "wow" moment when the reader is convinced that they too can learn about basketball through these data. The section on "Movement and Simluation Functions" ends abruptly with no discussion of the results. What did we learn from this?
Can a reader get the SportVU data other than the sample you posted on GitHub? I think a discussion of the availability is warranted.
I strongly recommend including a short section near the beginning that orients the reader to the units and coordinates of these data. For example, are the x's and y's measured in feet? Where is (0,0)? Later you mention that the coordinates are in a [0,47] x [0,50] grid, which I recognize since the court is 94 feet by 50 feet, but most readers will not pick up on that. I'd suggest including a graphic very early in the paper that makes this clear.
There are some moments of non-academic phrasing (e.g. "Let's back it up", "all we have to do", "boils down to", "gridding up"), that could be easily corrected.
If you're going to stick with the "shoulders of past giants" line, I'd suggest weaving in an NBA joke (since the players are so tall!)

In summary, I think this is a really worthwhile effort, but that it would work better as a vignette about what the sportVU package can do and how it works, as opposed to an annotated description of your code.

Review #2

Summary

The authors provide a guide to effectively using SportVU player tracking data in order to analyze player movement NBA games. The data are new, as well as somewhat raw and hard to work with. The contribution appears to be helping people who want to start working with this (likely) popular source of data and provide some guidance on how to munge, model, and visualize it. The authors contribute a simple model of player movement, predicting position at a time period using their velocity and position at the last time period plus some unobserved acceleration term which can be measured using a model.

Strengths

The authors have clearly thought carefully about the structure of the data and solved many pragmatic problems. Often getting to the point where you can work with large/complex data sets like this is difficult and I appreciate the importance of helping people get started quickly.
The modeling approach for player movement is reasonable. There are many refinements you could envision (hierarchical models to borrow strength across players), incorporating game situation or lineup information, etc. But I think this is a nice introduction to thinking about player modeling.

Weaknesses

The majority of the paper is code for data-munging. While I appreciate the pragmatism here, it seems brittle to provide so many implementation details here, rather than high level function signatures. I'd much prefer to see a brief description of the functions and example input/output dataframes shown as tables. It's very hard to slog through R code inline while reading and I'm sure the reader will likely just use the functions directly anyway. The implementations are much less interesting than what they conceptually do.
In many cases the code is not idiomatic R code. I'd strongly prefer example code that fits the tidy data usage patterns -- it would be more representative of modern R code and more readable.
The visualizations in the paper could use some work. Figure 1 would be clearer with a truncated x-scale. Figures >= 2 are really the star of the show and should be presented first thing in the paper.
In general there should be more visualizations or summarization of the data in order to understand what the data look like. I get very little sense of the format of the data from code, but a few plots with less-heavy modeling could tell a story about what's going on. For instance just a heatmap of positions and velocities (before even measuring acceleration) would be useful to see.
The paper doesn't have a clear objective for success of "modeling player movement". I don't actually know what a good player movement model is, and after reading the paper I'm still not sure. There's a conclusion that the plots based on one game pass the "smell test" but I would challenge the authors to provide a better criterion for the success of this. I understand basketball pretty well, but I struggled to understand that these plots were a successful effort. For instance you might consider it a prediction problem (can I predict where a player will be some time steps ahead?) or might describe some clusters of movement patterns as representing different types of play.

Minor Points

The list of column names should be presented as a table, not bulleted list.
It would be useful to have a graphical overview of the munging process (e.g a flowchart) as part of the paper so I could see where it was going at a high level. You might include the function names here.
I suggest that in the conclusion the authors take some time to suggest what an analyst might use these visualizations for. Are they useful for evaluation players? Do they have strategic implications?

Conclusion

I think is a promising paper from the standpoint of usefulness and impact. This data source is likely to be popular and the solutions to practical problems working with the data are likely to be useful to many. I recommend they read Carson Sievert's Taming PITCHf/x Data with XML2R and
pitchRx paper which is a nice prototype for this kind of contribution, and reformat the paper to fit that aesthetic more closely.

More specific title?

Hi @lukebornn @stevenwu4,

I'm working to update the status of everything and initiate getting the entire collection ready.

While looking over a summary of titles, I noticed yours is still somewhat generic:

Modeling Player Movement

Are you interested in putting basketball in the title? Or ... anything that's better advertising for the piece? I'm sure you know better than I.

Jenny's AE review

Hi @lukebornn @stevenwu4,

I just enjoyed reading your paper and the helpful reviews by @beanumber and @seanjtaylor. You've obviously already done a lot of revision in response to those reviews, leaving mercifully little for me to do here.

It would be good if the repo had a README with instructions on, e.g., how to produce the PDF. I managed to get some version of it, but only after removing some setwd() code (hey: check out the solutions for this problem 😄) and changing the results chunk options. I never work with .Rnw so perhaps this was user error, but I just took the path of least resistance in RStudio.

I'll make a few more small suggestions and, once settled, we'll get you to submit this to the TAS system.

Get ready for TAS. You'll need to make sure you're typesetting with a relevant template. Our existing thread about this is here: https://github.com/dsscollection/dsscollection/issues/39.

Some of the references only show up with title and authors. Are those the blog posts? If so, I would expect them to have URL and date, at the very least.

I like the new title. If @lukebornn feels it is appropriate, I think adding his new affiliation with the Kings would provide a nice validation of the relevance and credibility of this work!

The state and distribution channel for the data and code: We can certainly leave it "as is", but might want to consider making a fresh repo for it. Although I am not opposed to simply turning the switch to public on this one.

For data, I suggest you consider depositing in a proper data repository vs leaving as GitHub only. I'm not sure if the sports nature (vs science) makes it difficult to use one of the usual ones, such as Figshare.
I'm glad it looks like you've removed a lot of the inline code and focused more on the high-level interface of some utility functions, as per @beanumber's suggestions. Although it's not coupled to the fate of this article, I'd encourage you to seriously consider the advice to adopt a more idiomatic R style and to make this into an R package that others can use.

About a specific comment from @beanumber:

I'll leave it to the @jennybc and @hadley to decide whether adding two derivative columns to a data frame is an operation that will be interesting to readers.

I think there is a way to make this interesting, at least to many people. I have faced a similar challenge from analyzing ultimate frisbee data. Most people have never thought about the practical problem of using recorded game data to derive higher level game-play-based structure, such as possessions, and inferring who is on offense and which direction they are heading. It's clear you can do it, but I think you show and people will appreciate that it's not completely automatic. That's how I would frame some of that up front work.

Overall suggestion: use present rather than future tense. So "we will address", "we will detail", becomes "we address", "we detail", everywhere.

Table 1: "z coordinates" should not be plural -- should be "z coordinate".

In R, it's more correct to say "package" than "library". I'm thinking of the references to the raster package.

I really liked the intro of @seanjtaylor's review, e.g.:

The authors contribute a simple model of player movement, predicting position at a time period using their velocity and position at the last time period plus some unobserved acceleration term which can be measured using a model.

I think it should inspire a few more sentences around the player movement model and what the terms mean. Stuff like: "A player's position at time t + 1 is modelled as position at time t plus ...." Basically restating the equation in words and helping the reader attach meaning to each term. Especially once I realized how critical the "empirical eta's" were to become, I really wished for some more help from the experts about how I should think about them.

I see @beanumber already pushed back on some of the basketball lingo and sports vocabulary and I gather some changes have already been made. I actually like that, towards the end, the paper gets quite serious about the basketball specifics. And yet I would still like to see more of an effort to bring the wording closer to a "general public" style (at least we're not aiming for "academic journal style"!).

Overall, let me repeat that this is a great case study of what it takes to bring raw SportVU data into an analytical environment, through a statistical model, and into useful visualizations. I look forward to a few more small revisions and submission to TAS.

dsscollection / basketball Goto Github PK

basketball's Introduction

basketball's People

Contributors

Stargazers

Watchers

Forkers

basketball's Issues

review of first draft

Review #2

Summary

Strengths

Weaknesses

Minor Points

Conclusion

More specific title?

Jenny's AE review

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent