The default structure for some_play['players']
is a dictionary of lists of events indexed by player ID. For example (JSON format):
{
"ydstogo": 5,
"note": null,
"qtr": 1,
"yrdln": "NYG 18",
"sp": 0,
"down": 3,
"players": {
"0": [
{
"playerName": "",
"clubcode": "NYG",
"yards": 0,
"statId": 4,
"sequence": 1
},
{
"playerName": "",
"clubcode": "NYG",
"yards": 0,
"statId": 6,
"sequence": 2
}
],
"00-0023590": [
{
"playerName": "G.Sensabaugh",
"clubcode": "DAL",
"yards": 0,
"statId": 79,
"sequence": 8
}
],
"00-0022803": [
{
"playerName": "E.Manning",
"clubcode": "NYG",
"yards": 26,
"statId": 15,
"sequence": 3
},
{
"playerName": "E.Manning",
"clubcode": "NYG",
"yards": 17,
"statId": 111,
"sequence": 4
}
],
"00-0027265": [
{
"playerName": "V.Cruz",
"clubcode": "NYG",
"yards": 26,
"statId": 21,
"sequence": 5
},
{
"playerName": "V.Cruz",
"clubcode": "NYG",
"yards": 0,
"statId": 115,
"sequence": 6
},
{
"playerName": "V.Cruz",
"clubcode": "NYG",
"yards": 9,
"statId": 113,
"sequence": 7
}
]
},
"time": "10:50",
"ydsnet": 31,
"posteam": "NYG",
"desc": "(10:50) (Shotgun) E.Manning pass deep left to V.Cruz ran ob at NYG 44 for 26 yards (G.Sensabaugh)."
},
I find this format a little bit unintuitive because it seems like the value of some_play['players']
is just a perversely structured list of events.
Instead of the above, what would you think about adopting a format more along the lines of the below (JSON again)?
{
"down": 3,
"note": null,
"qtr": 1,
"yrdln": "NYG 18",
"sp": 0,
"ydstogo": 5,
"time": "10:50",
"ydsnet": 31,
"events": [
{
"playerId": null,
"playerName": "",
"statId": 4,
"yards": 0,
"clubcode": "NYG"
},
{
"playerId": null,
"playerName": "",
"statId": 6,
"yards": 0,
"clubcode": "NYG"
},
{
"playerId": "00-0022803",
"playerName": "E.Manning",
"statId": 15,
"yards": 26,
"clubcode": "NYG"
},
{
"playerId": "00-0022803",
"playerName": "E.Manning",
"statId": 111,
"yards": 17,
"clubcode": "NYG"
},
{
"playerId": "00-0027265",
"playerName": "V.Cruz",
"statId": 21,
"yards": 26,
"clubcode": "NYG"
},
{
"playerId": "00-0027265",
"playerName": "V.Cruz",
"statId": 115,
"yards": 0,
"clubcode": "NYG"
},
{
"playerId": "00-0027265",
"playerName": "V.Cruz",
"statId": 113,
"yards": 9,
"clubcode": "NYG"
},
{
"playerId": "00-0023590",
"playerName": "G.Sensabaugh",
"statId": 79,
"yards": 0,
"clubcode": "DAL"
}
],
"posteam": "NYG",
"desc": "(10:50) (Shotgun) E.Manning pass deep left to V.Cruz ran ob at NYG 44 for 26 yards (G.Sensabaugh)."
}
The key difference here is that each play boils down to an ordered sequence of "events" (whatever you want to call them), each of which is associated with a player (unless it's not).
In my mind, the two main advantages of this are:
- You can ask: "Show me all pass completions of at least 26 yards where at least 9 yards came after the catch."
- It would be easier to flatten into a relational structure. Since it is just a list of dictionaries (or, in JSON, an array of objects), it is agnostic to whatever lookup method you prefer, as opposed to the default pattern which assumes you want to look up events based on player ID.
Here is one way you could achieve this:
def rotate_events(players):
"""
The variable `players` is a dictionary of lists of events indexed
by player ID; e.g.:
{"00-0023590": [{"playerName": "G.Sensabaugh",
"clubcode": "DAL",
"yards": 0,
"statId": 79,
"sequence": 8}],
"00-0022803": [{"playerName": "E.Manning",
"clubcode": "NYG",
"yards": 26,
"statId": 15,
"sequence": 3},
{"playerName": "E.Manning",
"clubcode": "NYG",
"yards": 17,
"statId": 111,
"sequence": 4}]}
"""
temp = list()
for key, val in players.items():
for event in val:
# Append `temp` with a tuple containing the value of
# event['sequence'] and the reformatted dictionary representing
# the event; e.g.:
# (4, {"playerName": "E.Manning", "clubcode": "NYG", "statId": 111})
temp.append((event['sequence'],
dict(playerName=event['playerName'] or None,
playerId=event['playerId'] or None,
clubcode=event['clubcode'],
statId=event['statId'],
yards=event['yards'],)))
# Return 2nd element of each tuple after sorting by the first
return [t[1] for t in sorted(temp, key=lambda t: t[0])]