roclark / sportsipy Goto Github PK
View Code? Open in Web Editor NEWA free sports API written for python
License: MIT License
A free sports API written for python
License: MIT License
Describe the bug
Both the NCAAB and NCAAF Boxscores
classes are missing fields that were intended to be included with the previous release, namely the winners and losers names and abbreviations.
To Reproduce
from datetime import datetime
from sportsreference.ncaab.boxscore import Boxscores as NCAAB
from sportsreference.ncaaf.boxscore import Boxscores as NCAAF
b = NCAAB(datetime.today())
print(b.games)
b = NCAAF(datetime.today())
print(b.games)
Expected behavior
I expect the following fields, as indicated in the documentation, to be included as fields in the games
property for both NCAAB and NCAAF: winning_name
, winning_abbr
, losing_name
, losing_abbr
.
Desktop (please complete the following information):
The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.
Description
NCAAB Postseason games missing "home" name/abbreviation after getting through Boxscores object (regular season games on the same date have all the info) . Also, missing the winning/losing team name/abbreviation depending whether home team won or lost.
To Reproduce
from sportsreference.ncaab.boxscore import Boxscores
import pandas as pd
daily_schedule = Boxscores(pd.to_datetime('3/14/19')).games
schedule = pd.DataFrame(daily_schedule['3-14-2019'])
Expected behavior
Team data is fully populated as it is with regular season games.
Not the biggest deal but thought you might like to know.
First, wonderful project! I personally would find it useful if I could pull all boxscores in some date window. Neither Boxscore, nor Boxscores is really equipped to do this elegantly. Looks like I'd have to loop over teams, then team schedules, then query boxscore stats using their boxscore id. It would be great if there was a dedicated function to do this.
Examples:
List of all timestamp sorted boxscores 2009--present
List of all boxscores available
List of all boxscores this year
Hi!
It seems URL for Vegas roster is wrong. I get an error that it's calling https://www.hockey-reference.com/teams/veg/2019.html instead https://www.hockey-reference.com/teams/VEG/2019.html
Describe the bug
Regardless of the year, the NCAAB rankings always show up as empty. No errors are thrown, but something is clearly wrong.
To Reproduce
from sportsreference.ncaab.rankings import Rankings
for year in range(2000, 2019):
rankings = Rankings(str(year))
print(rankings.complete)
{0: []}
{0: []}
...
{0: []}
Expected behavior
I would expect the above code to return rankings that follow the format listed in the documentation, such as:
{
week number, ie 19 (int): [
{
'abbreviation': Team's abbreviation, such as 'PURDUE'
(str),
'name': Team's full name, such as 'Purdue' (str),
'rank': Team's rank for the current week (int),
'week': Week number for the results, such as 19 (int),
'date': Date the rankings were released, such as
'2017-03-01'. Can also be 'Final' for the final
rankings or 'Preseason' for preseason rankings
(str),
'previous': The team's previous rank, if applicable
(str),
'change': The amount the team moved up or down the
rankings. Moves up the ladder have a positive
number while drops yield a negative number
and teams that didn't move have 0 (int)
},
...
],
...
}
Desktop (please complete the following information):
Additional context
It appears sports-reference.com changed the layout of the standard rankings page which is not compatible with the Rankings
class. The old version is still saved on the site with a URL along the line of https://www.sports-reference.com/cbb/seasons/2019-polls-old.html
. The old page should be used if possible to reduce the amount of code refactoring.
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
Describe the bug
I'm getting a xml parsing error when I loop through the Rosters for 2019, but not any other year.
To Reproduce
Sample code which causes an issue.
from sportsreference.ncaab.roster import Roster
bama = Roster('ALABAMA')
for player in bama.players:
# Prints the name of all players who played for Alabama in the most
# recent season.
print(player.name)
print("At least we've got football")
Expected behavior
Print the players name.
Desktop (please complete the following information):
Additional context
I'm really wondering if this is just on my setup or if Sports Reference is doing something different on their site. Here's the link: https://www.sports-reference.com/cbb/schools/alabama/2019.html
This happens to me on the sample code too, the Bama fan in me change just changed the team :)
This really is an incredible project!
Describe the bug
Schedule fails to load due to date not being in the correct datetime format
To Reproduce
Sample code which causes an issue.
from sportsreference.ncaaf.schedule import Schedule, Game
Schedule('MARYLAND', 2007).dataframe
Expected behavior
Schedule loads for the given team and year
Additional context
https://www.sports-reference.com/cfb/schools/maryland/2007-schedule.html
The issue is caused because the "Time" column does not exist prior to the 2013 season.
Hot fix can be implemented by including "or self._time is None" at line 193 in schedule.py but there may be more elegant ways of doing it.
Is your feature request related to a problem? Please describe.
It's more me being a newbie, but I don't understand yet how to take the data here and be able to write it to a csv
Describe the solution you'd like
Additional tutorial in the docs in addition to the great job that has already been completed.
Describe alternatives you've considered
Pasting to notepad
Additional context
Appreciate your time, this is an amazing project.
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
Describe the bug
For certain teams in the 2017-18 NCAAB season (and a couple previous seasons), the Roster object won't load (see example teams below). I think it has to do with broken links to player pages for players who appear on the given team's roster.
To Reproduce
from sportsreference.ncaab.roster import Roster, Player
roster = Roster(team = 'houston', year = 2018, slim = False)
roster = Roster(team = 'virginia-tech', year = 2018, slim = False)
Expected behavior
Rosters to be returned like they are for the vast majority of NCAAB teams in most seasons.
Desktop (please complete the following information):
Additional context
As mentioned above, this appears to be an issue with teams having players whose player pages have broken links. I looked at the 2 teams mentioned above and found these "broken" links to player pages (at least they appear broken to me):
Appreciate if these can be fixed for these teams and others!
Is your feature request related to a problem? Please describe.
It's not a problem so much as a missing piece of what I think is relevant data
Describe the solution you'd like
That field added to the endpoint
Describe alternatives you've considered
n/a
Additional context
n/a
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
Describe the bug
Everything looks good, except the column under stadium is empty, and the column for time has the stadium name and location in the row.
To Reproduce
Sample code which causes an issue.
from sportsreference.ncaaf.boxscore import Boxscore
import csv
game_data = Boxscore('2018-01-08-georgia')
print(game_data.home_points) # Prints 23
print(game_data.away_points) # Prints 26
df = game_data.dataframe # Returns a Pandas DataFrame of game metrics
print (df)
df.to_csv('box.csv')
Expected behavior
Stadium name to be in stadium column and time to be in time column
Screenshots
N/A
Desktop (please complete the following information):
Additional context
N/A
Describe the bug
A type error is thrown when grabbing NCAAB stats from Teams when the stat is a difference between two stats like defensive_rebounds, opp_defensive_rebounds, and net_rating. I believe the issue is that one of the two (or both of the values) are missing. When the value is missing (i.e. the stat is not listed on sports-reference.com), the value is set to None and the type error occurs.
To Reproduce
Sample code which causes an issue.
from sportsreference.ncaab.teams import Teams
teams=Teams(2000)
Additional context
The code above works for more recent years but years after 2009 (I believe) some of the data is not available from sports-reference.com which is what is causing the error.
A Possible Fix
An example of a possible fix is shown below. The defensive_rebounds, opp_defensive_rebounds, and net_rating definitions in teams.py file in ncaab need to be modified to try to perform the calculation first. In the event that it fails, set the value to None.
def defensive_rebounds(self):
"""
Returns an ``int`` of the total number of defensive rebounds during the
season.
"""
try:
out=self.total_rebounds - self.offensive_rebounds
except:
out=None
return out
A similar try/except will need to be performed for opp_defensive_rebounds, and net_rating as well.
Describe the bug
The latest version of sportsreference
(0.3.0) throws an error when pulling boxscores that don't have a score associated with it yet. The stack trace looks like the following:
File "t.py", line 4, in <module>
b = Boxscores(datetime.today())
File "/home/roclark/sportsreference/sportsreference/ncaab/boxscore.py", line 1212, in __init__
self._find_games(date, end_date)
File "/home/roclark/sportsreference/sportsreference/ncaab/boxscore.py", line 1546, in _find_games
boxscores = self._extract_game_info(games)
File "/home/roclark/sportsreference/sportsreference/ncaab/boxscore.py", line 1493, in _extract_game_info
names = self._get_team_names(game)
File "/home/roclark/sportsreference/sportsreference/ncaab/boxscore.py", line 1458, in _get_team_names
away_score = self._get_score(scores[0])
IndexError: list index out of range
To Reproduce
A call to the Boxscores
class for any date that is not in the past (ie. today or a future date) will throw the error.
from datetime import datetime
from sportsreference.ncaab.boxscore import Boxscores
b = Boxscores(datetime.today())
print(b.games)
Expected behavior
The boxscore should still be parsed, but the winner, loser, and score should all be None
in this case. Just because a score can't be associated with the game, doesn't mean an error should be thrown as there is currently no other way to query future games using sportsreference
outside of pulling the team's schedule.
Desktop (please complete the following information):
There is an outstanding issue with the Boxscore
classes where game information for older seasons (generally 5+ years ago) is incorrectly parsed due to a lack of information displayed on the page. For example, an NHL game from 2005 contains much less meta-information than one from 2018. The Boxscore
class expects a certain number of rows to be located on the page. If an item is missing, the order can be thrown off and information can be put into the wrong categories, such as the game's date getting stored in the attendance
property.
Describe the bug
Slow response avg. 25-30 seconds to retrieve list of player_ids' in team roster.
To Reproduce
from sportsreference.nba.roster import Roster
def get_roster(req, res):
param1 = req.query_params['team']
team = Roster(param1)
players = team.players
res.media = [player.player_id for player in players]
pass
Desktop (please complete the following information):
Is your feature request related to a problem? Please describe.
Currently, the Boxscores
classes are intended to be forward-looking in the sense that they don't check or save the results of completed games. Being able to show scores and the winners would allow many opportunities to handle the information in unique ways and can be considered a critical feature.
Describe the solution you'd like
The Boxscores
class for all 6 sports currently supported in sportsreference
should include properties to indicate scores and winners for every game that has completed.
Describe alternatives you've considered
As the Boxscore
class already has this information, and using Boxscores
is the best way to query multiple games, the functionality is most clearly missing from Boxscores
and should be placed there.
Additional context
The following properties should be added to the Boxscores
classes:
winning_name
: Full name of the winning teamwinning_abbr
: Abbreviation of the winning teamlosing_name
: Full name of the losing teamlosing_abbr
: Abbreviation of the losing teamhome_score
: Integer score for the home teamaway_score
: Integer score for the away teamHi @roclark! Really like the project. What do you think of supporting local HTML files that have been downloaded from sports-reference in advance?
Could be nice to let users specify that they've pre-downloaded certain resources through some kind of API configuration, maybe with a mapping like {'some-resource-id': 'path_to_resource_page.html'}
After looking through the code a bit, maybe this could happen in utils.py
with some new function that gets a document, choosing between PyQuery(url=x)
and PyQuery(filename=x)
?
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
I am just trying to basic data on all mlb teams, but I get an urllib.HTTPError
from sportsreference.mlb.teams import Teams
teams = Teams()
for team in teams:
print(team.name)
Expected behavior
Instead of printing each team's name I get
urllib.error.HTTPError: HTTP Error 404: Not Found
Does this project comply with SR's data use policy?
This serves as a notice that support for Python 3.4 will be dropped on the first release on or after March 1, 2019. This is to coincide with Python 3.4 officially going end-of-life on March 16, 2019 and to encourage users to transition to newer version of Python. Any issues that are raised for newer versions of sportsreference
that are caused by Python 3.4 will be ignored as users will be asked to update their Python version or use an older version of sportsreference
that supports Python 3.4.
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
hey, this is an awesome project, love the idea. was literally about to start scraping before i looked around for something.
any plans to include contract/salary 💰data?
I noticed that in your documentation you say Schedule() returns all scheduled games for any given team in any league. I am finding that This function returns all played games from a given year. I am sorry if this is already possible or if I am simply doing something wrong, but I would like to be able to pull in upcoming games on a given schedule as well.
Is your feature request related to a problem? Please describe.
No
Describe the solution you'd like
Boxscore_ID should be updated to include a playoff vs. regular season indicator.
Describe alternatives you've considered
I can manually assign the indicator using the date but it would be nice for it to be built in.
This is how I made a makeshift indicator.
def playoff_ind(i):
if i+1 <= 82:
return 'N'
if i+1 > 82:
return 'Y'
else:
return None
sched = nba_schedue('BOS')
games = []
for i,game in enumerate(sched):
games.append(game.boxscore_index + playoff_ind(i))
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
I've successfully run through your examples for getting NFL Player data (i.e. Brees), but when I look to get data on other players (specifically, receivers like Julio Jones or Wes Welker), there seems to be very little actual specific data (i.e. receiving_yards, etc.). Simple example:
from sportsreference.nfl.roster import Player
jones = Player('JoneJu02')
stats = jones.dataframe[["games", "receptions"]]
stats
games receptions
2011 13 None
2012 16 None
2013 5 None
2014 15 None
2015 16 None
2018 16 None
Career 111 None
Totally open to the idea I'm just missing something basic, so apologies in advance if that's the case. Thanks.
This serves as a notice that support for Python 2.7 will be dropped on the first release on or after July 1, 2019. This is to prepare for Python 2.7 officially going end-of-life on January 1, 2020 and to encourage users to transition to a newer version of Python. Any issues that are raised for newer versions of sportsreference
that are caused by Python 2.7 will be ignored as users will be asked to update their Python version or use an older version of sportsreference
that supports Python 2.7.
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
Is your feature request related to a problem? Please describe.
The Teams interface can be a bit a clunky when you're learning to use the module. For instance, running:
from sportsreference.mlb.teams import Teams
teams = Teams(2017)
print(teams)
bombs at the printing statement:
TypeError: str returned non-string (type list)
On the other hand, if running the following code:
my_team = teams("HOU")
print(my_team)
then I can see that I have a Team object. However, if I go after the schedule with:
my_team.schedule
then I get another type error. I can get what I want by specifying that I want the dataframe attribute, but this is not intuitive:
hou_schedule = my_team.schedule
df1 = hou_schedule.dataframe
print(df1)
Describe the solution you'd like
After:
teams = Teams(2017)
I think onr should be able to
Describe alternatives you've considered
The first section describes a workaround. The package is working great. The interfaces could use some work.
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
``Traceback (most recent call last):
`` File "./scrape.py", line 28, in
`` sched = getScheduleDF(team, yr)
`` File "./scrape.py", line 15, in getScheduleDF
`` df = sched.dataframe_extended
`` File "/home/chris/.local/lib/python3.6/site-packages/sportsreference/ncaaf/schedule.py", line 459, in dataframe_extended
`` frames.append(game.dataframe_extended)
File "/home/chris/.local/lib/python3.6/site-packages/sportsreference/ncaaf/schedule.py", line 159, in dataframe_extended
return self.boxscore.dataframe
File "/home/chris/.local/lib/python3.6/site-packages/sportsreference/ncaaf/boxscore.py", line 240, in dataframe
'away_first_downs': self.away_first_downs,
File "/home/chris/.local/lib/python3.6/site-packages/sportsreference/ncaaf/boxscore.py", line 374, in away_first_downs
return int(self._away_first_downs)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
Is your feature request related to a problem? Please describe.
I'm sure you've considered this, but being able to get individual player stats for a given game would be wonderful. Particularly for the NCAA basketball module, though I'm sure all of them could use it.
Describe the solution you'd like
I'd guess that it'd be easiest to do this within the BoxScore class, but I suppose you could also do it by passing in the boxscore URI to a Player object similar to how you do it by season now. Just by glancing at the code for it, it seems like the former would be easier than the latter, though perhaps not as elegant.
Describe alternatives you've considered
I've looked just about everywhere for free NCAAB player data, and I'm pretty sure the only real solution right now is to scrape. Which while not necessarily difficult, is annoying given how often most stats sites change. This is by far the most complete API with any sort of wrapper that I've seen, so I really appreciate what you've done already. Even if you are a Carsen Edwards fan ;)
It'd be great to hear if this is in your plans at all.
Describe the bug
For 2 teams in the 2014-15 NCAAB season (see code/screenshot below), the "dataframe" associated with the Schedule object won't load. The error messages seem to indicate the error appears to have to do with missing game time for a single game on each of the team's schedule.
For all teams I've tried prior to the 2014-15 season, it seems that the dataframe associated with the Schedule object won't load. This shows a different error but also seems related to missing data, it seems, as no teams in those earlier seasons have times on their schedules (it appears to be something that Sports Reference only has going back a few years). In this case, it does seem that you can access sched.dataframe_extended for those teams, which has different columns than sched.dataframe, which makes me think this has to do with some exception handling around missing data as well.
To Reproduce
sched = Schedule(abbreviation = 'cleveland-state', year = 2015)
sched_df = sched.dataframe
sched_df
sched = Schedule(abbreviation = 'savannah-state', year = 2015)
sched_df = sched.dataframe
sched_df
sched = Schedule(abbreviation = 'kentucky', year = 2014)
sched_df = sched.dataframe
sched_df
Expected behavior
The schedule data frame to be accessible for these teams like it is for the vast majority of NCAAB teams in 2014-15 and forward.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context
As seen in the pycodestyle repository, PEP8
is now a deprecated tool and does not support newer python code styling practices. In order to stay up-to-date with python code styling, PEP8
should be replaced with pycodestyle
. Prior to closing, any new warnings or errors that are thrown from pycodestyle
that were not raised by PEP8
should be resolved.
In order to improve the overall community for sportsreference, several items should be added:
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
class sportsreference.ncaab.teams.Team(team_data, team_conference=None, year=None)
Would like to query team using the param abbr
Eg. 'PURDUE'
Description
Cannot instantiate schedule object. Receive "TypeError: 'NoneType' object is not iterable"
To Reproduce
from sportsreference.nba.schedule import Schedule
boston_schedule = Schedule('BOS')
Expected behavior
Expect to be able to use object to iterate over desired team's schedule.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.