vishaalagartha / basketball_reference_scraper Goto Github PK
View Code? Open in Web Editor NEWA python module for scraping static and dynamic content from Basketball Reference.
License: MIT License
A python module for scraping static and dynamic content from Basketball Reference.
License: MIT License
The error i'm running into:
ERROR: Command errored out with exit status 1:
command: /Library/Developer/CommandLineTools/usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/tmp/pip-install-adjqudls/pandas/setup.py'"'"'; file='"'"'/private/tmp/pip-install-adjqudls/pandas/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
cwd: /private/tmp/pip-install-adjqudls/pandas/
Complete output (101 lines):
Processing numpy/random/_bounded_integers.pxd.in
Processing numpy/random/_philox.pyx
Traceback (most recent call last):
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 59, in process_pyx
from Cython.Compiler.Version import version as cython_version
ModuleNotFoundError: No module named 'Cython'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 235, in <module>
main()
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 231, in main
find_process_files(root_dir)
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 222, in find_process_files
process(root_dir, fromfile, tofile, function, hash_db)
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 188, in process
processor_function(fromfile, tofile)
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 64, in process_pyx
raise OSError('Cython needs to be installed in Python as a module')
OSError: Cython needs to be installed in Python as a module
Running from numpy source directory.
/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py:485: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
run_build = parse_setuppy_commands()
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 154, in save_modules
yield saved
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 250, in run_setup
_execfile(setup_script, ns)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 45, in _execfile
exec(code, globals, locals)
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 513, in <module>
# we can't do anything about these warnings because they stem from
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 493, in setup_package
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 290, in generate_cython
for pxifile in _pxifiles:
RuntimeError: Running cythonize failed!
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/tmp/pip-install-adjqudls/pandas/setup.py", line 809, in <module>
setup(
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/__init__.py", line 144, in setup
_install_setup_requires(attrs)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/__init__.py", line 139, in _install_setup_requires
dist.fetch_build_eggs(dist.setup_requires)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/dist.py", line 716, in fetch_build_eggs
resolved_dists = pkg_resources.working_set.resolve(
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 780, in resolve
dist = best[req.key] = env.best_match(
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1065, in best_match
return self.obtain(req, installer)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1077, in obtain
return installer(requirement)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/dist.py", line 786, in fetch_build_egg
return cmd.easy_install(req)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 679, in easy_install
return self.install_item(spec, dist.location, tmpdir, deps)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 705, in install_item
dists = self.install_eggs(spec, download, tmpdir)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 890, in install_eggs
return self.build_and_install(setup_script, setup_base)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 1158, in build_and_install
self.run_setup(setup_script, setup_base, args)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 1144, in run_setup
run_setup(setup_script, args)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 253, in run_setup
raise
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 166, in save_modules
saved_exc.resume()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 141, in resume
six.reraise(type, exc, self._tb)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/_vendor/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 154, in save_modules
yield saved
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 195, in setup_context
yield
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 250, in run_setup
_execfile(setup_script, ns)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 45, in _execfile
exec(code, globals, locals)
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 513, in <module>
# we can't do anything about these warnings because they stem from
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 493, in setup_package
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 290, in generate_cython
for pxifile in _pxifiles:
RuntimeError: Running cythonize failed!
Cythonizing sources
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
^ the error i'm running into trouble with
I followed the code in the example and I keep getting the error, "No Tables"
Running a fresh pip install of this project inside a jupyter notebook.
from basketball_reference_scraper.players import get_stats, get_game_logs _stats = get_stats('Clint Capela', stat_type='PER_GAME', playoffs=False, career=False, ask_matches=False)
Produces this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-1-7b87a5c104b7> in <module>
1 from basketball_reference_scraper.players import get_stats, get_game_logs
----> 2 _stats = get_stats('Clint Capela', stat_type='PER_GAME', playoffs=False, career=False, ask_matches=False)
/opt/anaconda3/lib/python3.8/site-packages/basketball_reference_scraper/players.py in get_stats(_name, stat_type, playoffs, career, ask_matches)
12 def get_stats(_name, stat_type='PER_GAME', playoffs=False, career=False, ask_matches = True):
13 name = lookup(_name, ask_matches)
---> 14 suffix = get_player_suffix(name).replace('/', '%2F')
15 selector = stat_type.lower()
16 if playoffs:
AttributeError: 'NoneType' object has no attribute 'replace'
This code works with other players like 'nikola jokic'.
Note: I just installed this and had to install lxml with lxml-4.4.2-cp37-cp37m-win_amd64.whl using pip install manually.
Here is the traceback, it seems like it can't find constants.py for some reason:
Traceback (most recent call last):
File "C:\Users\yishi\PycharmProjects\baksetball-algo\venv\lib\site-packages\basketball_reference_scraper\teams.py", line 6, in <module>
from constants import TEAM_TO_TEAM_ABBR, TEAM_SETS
ModuleNotFoundError: No module named 'constants'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/yishi/PycharmProjects/baksetball-algo/scraper.py", line 1, in <module>
from basketball_reference_scraper.teams import get_roster, get_team_stats, get_opp_stats, get_roster_stats, get_team_misc
File "C:\Users\yishi\PycharmProjects\baksetball-algo\venv\lib\site-packages\basketball_reference_scraper\teams.py", line 10, in <module>
from basketball_reference_scraper.utils import remove_accents
File "C:\Users\yishi\PycharmProjects\baksetball-algo\venv\lib\site-packages\basketball_reference_scraper\utils.py", line 4, in <module>
import unicodedata, unidecode
ModuleNotFoundError: No module named 'unidecode'
Hello vishaalagartha,
Thanks for the very helpful API.
Just something to bring to your attention.
Charlotte Hornets were founded in 1988 and change their name in 2004 in Charlotte Bobcats until 2014 when they change it back to Charlotte Hornets.
However Basketball Reference has 2 different short names for Charlotte Hornets 1988-04 and Charlotte Hornets 2004-present.
These two names are:
Charlotte Hornets 1988-04: CHH
Charlotte Hornets 2004-present: CHO
Using the CHO for any stats before 2004 will result in an error.
Thanks again!
This is regarding displaying dataframe to frontend HTML page via flask.
The function to_html
is not able to be utilized due to previous version of pandas this API uses.
basketball-reference-scraper 1.0.2 requires pandas==0.25.3, but you'll have pandas 1.1.1 which is incompatible.
closed
Team Misc. stats are correct on the website the day after games but is not scrapping correctly.
Missing the newest games.
Here are the Celtics Team and Opponent Stats and Misc. as of 1:54 PM 1/26/2021. (Day after game 16).
This is what I am getting for output. Notice how team and opponent per game stats are up to date but misc. is not.
When you run'
s = get_schedule(2019, playoffs = True)
You dont get the playoffs games past 2019-05-30. Its the same for all other season
Scrape data from the following endpoint:
https://www.basketball-reference.com/friv/injuries.fcgi
Return a data frame containing the following columns:
['PLAYER', 'TEAM', 'DATE', 'DESCRIPTION']
get_roster() and get_roster_stats() return None for teams that use abbreviations that are also used by current teams.
Example- get_roster('CHI',1949)
or get_roster_stats('WAS',1985)
return None but abbreviations for which there are no current teams work fine - like get_roster('KCK',1974)
.
Also other functions like get_team_stats('CHI',1950)
or get_team_misc('WAS',1985)
work fine.
For get_stats in players.py. If I do get_stats('Stephen Curry', 'PER_GAME', False, False, False) I get the error that get_stats only takes 1 to 4 arguments. If i specify ask_matches= False, I get the error that ask_matches is an unexpected keyword argument. Is there anyway to turn the option ask_matches off?
Just wondering if there are any plans to keep adding tables/endpoints like Advanced? Great module regardless!
I had trouble scraping Jaren Jackson Jr's data using function get_stats( )
, I think it returns his dad. I also tried to input 'Jaren Jackson Jr.'
but it returns the same result, not sure if you remove the suffix in the source code. Any workaround? Thanks
Recreation:
pg = get_game_logs('Pau Gasol', '2010-01-12', '2010-01-20', playoffs=False)
pg.loc[18:22,:] # return index of Pau's 18-22nd game for the season
Expectation:
Code should output the 4 games on 2009-12-06, 2009-12-09, 2009-12-11, and 2009-12-12 numbered as games 18,19,20,21 (offset by one because of 0-indexing) as seen here.
Reality:
Running the code above has those 4 games numbered as 18, 19, 21, 22. It appears that your program correctly skips the line that BBref uses as the header but still increments the index.
How I'm using it:
get_team_stats('UTA', year, data_format='PER_GAME').to_frame().transpose()
Error I'm getting:
\site-packages\basketball_reference_scraper\teams.py:42: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
s['SEASON'] = f'{season_end_year-1}-{str(season_end_year)[2:]}'
It could be how I'm dealing with the data, but some player names (Eg. "Kristaps Porziņģis", "Luka Dončić" are coming back with strange characters in place of the characters with accents on them when using get_roster() (Luka DonÄić, for example). I grabbed the function from here and tried messing around with the encoding and decoding, but no luck. I'm sure there's a workaround for this but I just can't figure it out.
When I use the function get_schedule(2020, playoffs=False), it returns the schedule from the season opener through 2019-10-31 but no games afterward. This seems to be the only season in which this is the case.
Thanks!
For example, a way to get Stephen Curry's statlines for each game he played in in 2018.
ValueError Traceback (most recent call last)
in
----> 1 schedule1=get_schedule(1971,playoffs=False)
/opt/anaconda3/lib/python3.7/site-packages/basketball_reference_scraper/seasons.py in get_schedule(season, playoffs)
16 soup = BeautifulSoup(r.content, 'html.parser')
17 table = soup.find('table', attrs={'id': 'schedule'})
---> 18 month_df = pd.read_html(str(table))[0]
19 df = df.append(month_df)
20 df = df.reset_index()
/opt/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only)
1103 na_values=na_values,
1104 keep_default_na=keep_default_na,
-> 1105 displayed_only=displayed_only,
1106 )
/opt/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, **kwargs)
910 break
911 else:
--> 912 raise_with_traceback(retained)
913
914 ret = []
/opt/anaconda3/lib/python3.7/site-packages/pandas/compat/init.py in raise_with_traceback(exc, traceback)
45 if traceback == Ellipsis:
46 _, _, traceback = sys.exc_info()
---> 47 raise exc.with_traceback(traceback)
48
49
ValueError: No tables found
After fiddling around with the web scraper for a while, I noticed that while using the get_roster_stats function, setting the data format variable to 'PER_36' returns the following:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in
get_roster_stats('GSW', 2019, 'PER_36', False)
File "C:\Program Files\Python38\Lib\site-packages\basketball_reference_scraper\teams.py", line 98, in get_roster_stats
df2 = pd.read_html(str(table))[0]
File "C:\Program Files\Python38\Lib\site-packages\pandas\io\html.py", line 1090, in read_html
return _parse(
File "C:\Program Files\Python38\Lib\site-packages\pandas\io\html.py", line 912, in parse
raise_with_traceback(retained)
File "C:\Program Files\Python38\Lib\site-packages\pandas\compat_init.py", line 47, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found
However, it is well known that basketball-reference indeed has these statistics on their website--is there a fix for this?
box_scores.get_box_scores is returning the same player name multiple times on the same team or opponent and in some cases the player names are incorrect. The stats seem ok, just the player names. This seems to occur in many of the box scores I have looked at. Here are a few examples:
2018-11-30, ORL, PHO, GAME, BASIC - Aaron Gordon returned as himself as part of ORL and instead of De'Anthony Melton for PHO
2021-04-22, LAL, DAL, GAME, BASIC - Alex Caruso returned as himself as part of LAL and instead of Luka Dončić and J.J. Reddick for DAL. Looking deeper at this game, several of the DAL player names are incorrect.
get_game_logs('LeBron James', '2003-08-01', '2020-02-02', True)
and received the follwing stacktrace:
Traceback (most recent call last): File "<input>", line 1, in <module> File "/Users/kwaku/Development/nba-data/venv/lib/python3.7/site-packages/basketball_reference_scraper/players.py", line 65, in get_game_logs if len(row['GS'])>1: TypeError: object of type 'int' has no len()
playoffs=False
get_stats in players.py and get_player_suffix in utils.py seem to run into problems when they're either used with a player name with non-traditional characters or when used with names shared with other NBA players. The former causes no dataframes to be returned because it cannot construct a valid suffix and the latter causes a dataframe of the wrong player to be returned.
For example, the API sees no difference between Tim Hardaway Sr. and Tim Hardaway Jr.
Another example, Luka Dončić, Dario Šarić, and others are not handled well by get_player_suffix, resulting in an error.
I have fixed it somewhat on my end by tailoring the code to my needs--however, do let me know if a future version of your API fixes this, thanks!
Hello,
In box reports player's names with accents are fixed using utils.remove_accents
, I guess the same should be done for injury reports. This could in principle be easily done, like
def get_injury_report():
r = get(f'https://widgets.sports-reference.com/wg.fcgi?css=1&site=bbr&url=%2Ffriv%2Finjuries.fcgi&div=div_injuries')
if r.status_code==200:
soup = BeautifulSoup(r.content, 'html.parser')
table = soup.find('table')
df = pd.read_html(str(table))[0]
df.rename(columns = {'Player': 'PLAYER', 'Team': 'TEAM', 'Update': 'DATE', 'Description': 'DESCRIPTION'}, inplace=True)
df['TEAM'] = df['TEAM'].apply(lambda x: TEAM_TO_TEAM_ABBR[x.upper()])
df['DATE'] = df['DATE'].apply(lambda x: pd.to_datetime(x))
df['PLAYER'] = df.apply(lambda injury: remove_accents(injury.PLAYER, injury.TEAM, df['DATE'].max().year, axis=1)
df['STATUS'] = df['DESCRIPTION'].apply(lambda x: x[:x.index('(')].strip())
df['INJURY'] = df['DESCRIPTION'].apply(lambda x: x[x.index('(')+1:x.index(')')].strip())
df['DESCRIPTION'] = df['DESCRIPTION'].apply(lambda x: x[x.index('-')+2:].strip())
return df
However, I fear that might fail e.g. if someone got traded and there was no injury yet in the new year or so. What do you think?
Cheers :-)
I'm getting a an error for no tables found when I use get_box_scores, but only for the Washington Wizards vs. Philadelphia 76ers game from yesterday. Anyway I can prevent this or work around it? To further clarify, I used the team abbreviations of PHI and WAS and the date of 2020-12-23. Also the day before that, the box score for the Lakers and clippers game read Dennis Schroder as Danny green, not sure if a fix is required for that. Overall, a really great API that I've been using.
The function get_game_logs('Kobe Bryant', '2010-01-12', '2010-01-20', playoffs=False) returns a 73 row pandas dataframe that has every game from 2009-10-27 to 2010-04-11.
Expectation: To return only game logs of Kobe that were between the dates provided, inclusively.
A similar (incorrect) result is seen when inputting:
get_game_logs('Pau Gasol', '2010-01-12', '2010-01-20', playoffs=False)
get_game_logs('Kobe Bryant', '2010-01-12', '2011-01-20', playoffs=False)
The following has two dummy stats, which cause later fields to be off. And the empty dummies result in a NaN (not a number)
<tr><th class="left " data-stat="season" scope="row">1 season</th><td class="center iz" data-stat="age"></td><td class="left " data-stat="team_id"><a href="https://www.basketball-reference.com/teams/IND/">IND</a></td><td class="left " data-stat="lg_id">NBA</td><td class="center iz" data-stat="pos"></td><td class="right " data-stat="g">65</td><td class="right " data-stat="mp">894</td><td class="right " data-stat="per">9.5</td><td class="right " data-stat="ts_pct">.507</td><td class="right " data-stat="fg3a_per_fga_pct">.427</td><td class="right " data-stat="fta_per_fga_pct">.133</td><td class="right " data-stat="orb_pct">2.3</td><td class="right " data-stat="drb_pct">6.3</td><td class="right " data-stat="trb_pct">4.3</td>
<td class="right " data-stat="ast_pct">20.7</td><td class="right " data-stat="stl_pct">1.4</td><td class="right " data-stat="blk_pct">0.9</td><td class="right " data-stat="tov_pct">17.2</td><td class="right " data-stat="usg_pct">19.2</td><td class="right iz" data-stat="DUMMY"></td><td class="right " data-stat="ows">-0.2</td><td class="right " data-stat="dws">0.5</td><td class="right " data-stat="ws">0.3</td><td class="right " data-stat="ws_per_48">.016</td><td class="right iz" data-stat="DUMMY"></td><td class="right " data-stat="obpm">-2.4</td><td class="right " data-stat="dbpm">-1.2</td><td class="right " data-stat="bpm">-3.7</td><td class="right " data-stat="vorp">-0.4</td></tr>
For example, (This specific example is for Aaron Brooks)
['2016-17', 32.0, 'IND', 'NBA', 'PG', 65.0, 894.0, 9.5, 0.507, 0.427, 0.133, 2.3, 6.3, 4.3, 20.7, 1.4, 0.9, 17.2, 19.2, nan, -0.2, 0.5, 0.3, 0.016, nan, -2.4, -1.2, -3.7, -0.4]
This happens for every single player, and the dummy may be in different locations.
When running the following code:
game = get_game_logs('Chris Paul', '2020-12-22', '2020-12-30', playoffs=False)
print(game)
The output is:
Results for Chris Paul:
Empty DataFrame
Columns: [DATE, AGE, TEAM, HOME/AWAY, OPPONENT, RESULT, GS, MP, FG, FGA, FG%, 3P, 3PA, 3P%, FT, FTA, FT%, ORB, DRB, TRB, AST, STL, BLK, TOV, PF, PTS, GAME_SCORE, +/-]
Index: []
I can pull game logs from previous seasons with no problem using the method above. Please let me know if I am not using the function correctly.
EDIT: I was able to get Blake Griffin's and Jamal Murray's 2020-2021 game log, but haven't been able to obtain any others for this season yet.
Hi all,
Bear with me, I'm very novice with Python and instead of finding out how to scrape Basketball-Reference myself I figured why not try this module first.
The game logs for regular season games seem to work fine using get_game_logs('LaMarcus Aldridge', '2013-04-20', '2017-08-01', playoffs=False)
, but if I want to look at playoff game logs and do get_game_logs('LaMarcus Aldridge', '2013-04-20', '2017-08-01', playoffs=True)
I get the following error on the first try:
C:\Users\myname\Anaconda3\lib\site-packages\pandas\core\ops_init_.py:1115: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
result = method(y)
What I get is an empty Pandas Dataframe. I also tried this with Damian Lillard and it also didn't work.
Am I doing something wrong?
P.S I tried get_game_logs('LeBron James', '2006-04-01', '2020-11-01', True)
and this seems to work partially, I get a Dataframes containing the latest game logs, the only problem is that all game logs before 2011 are missing.
This package depends on unidecode to work properly, yet doesn't list unidecode as a requirement. I'll try to submit a fix as a pull request later today, but I'm fine if someone else does it.
when ask_matches = False the lookup function needs to sort the list like it does when it's true and the list is > 1:
matches.sort(key=lambda tup: tup[1])
In utils.py get_player_suffix, only the first player name match is returned, which fails when there are players with the same names, for example, Dee Brown:
https://www.basketball-reference.com/players/b/brownde01.html
https://www.basketball-reference.com/players/b/brownde03.html
You might add a get_players method which returns a list of dfs for each player that matches, since there aren't that many players with duplicates at the moment (by my count, there are only doubles, not triples, and <20 in the past 30 years).
try to use the player 'Lary Nance'.
the returned data is always for this page:
https://www.basketball-reference.com/players/n/nancela01.html
although im looking for this page:
https://www.basketball-reference.com/players/n/nancela02.html
play_by_play returns an error when trying the replit example in documentation
full stack trace below
Traceback (most recent call last):
File "main.py", line 6, in
print(client.play_by_play(home_team=Team.BOSTON_CELTICS, year=2018, month=10, day=16))
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/basketball_reference_web_scraper/client.py", line 220, in play_by_play
values = http_service.play_by_play(home_team=home_team, day=day, month=month, year=year)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/basketball_reference_web_scraper/http_service.py", line 106, in play_by_play
away_team_name=page.away_team_name,
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/basketball_reference_web_scraper/html.py", line 791, in away_team_name
return self.team_names[0]
IndexError: list index out of range
get_team_misc(team, 2021)
This is not scraping the most up to date misc. stats for each team. Looks like it is grabbing a day behind's stats
Can you please add this :)
Various functions state that the data format can be 'RANK' or 'Y/Y'; however, when ran as such it returns either 'ValueError: No tables found' or 'UnboundLocalError: local variable 'selector' referenced before assignment'. What is the proper usage for RANK and what is the 'Y/Y' data format?
Hi @ vishaalagartha,
Thanks for this package! I ran into an issue while using the basketball_reference_scraper.teams.get_roster
function.
Here's the code to produce the error:
from basketball_reference_scraper.teams import get_roster
get_roster("FTW", 1956)
Here's the error:
~/.pyenv/versions/3.8.5/envs/global/lib/python3.8/site-packages/basketball_reference_scraper/teams.py in <lambda>(x)
21 df['PLAYER'] = df['PLAYER'].apply(lambda name: remove_accents(name, team, season_end_year))
22 df['BIRTH_DATE'] = df['BIRTH_DATE'].apply(lambda x: pd.to_datetime(x))
---> 23 df['NATIONALITY'] = df['NATIONALITY'].apply(lambda x: x.upper())
24 return df
25
AttributeError: 'float' object has no attribute 'upper'
If you look at the page for FTW, 1956, you'll see there's no nationality listed for Chuck Noble. The resulting value is then NaN
, which doesn't have an upper
function.
Using the built-in pd.DataFrame.str
method should address this issue:
df['NATIONALITY'] = df['NATIONALITY'].str.upper()
Miami Heat 2021 roster is given here. The table contains an empty row which causes the get_roster
to crash because remove_accents(name, team, season_end_year)
finds a NaN instead of a string. One solution would be to remove the nan Players in the get_roster
one line above, e.g.
df = df[df['PLAYER'].notna()]
df['PLAYER'] = df['PLAYER'].apply(lambda name: remove_accents(name, team, season_end_year))
it should also be combined with another update on utils.remove_accents
, for example line 54 should be
matches = sum(l1 == l2 for l1, l2 in zip(p, name)) if pd.notna(p) else 0
by the way, what is the utility of remove_accents()
function?
Running
schedule = get_schedule(season, playoffs=True)
Gives a ValueError: Length mismatch: Expected axis has 6 elements, new values have 5 elements
due to the following line in seasons.py
(line 28):
df.columns = ['DATE', 'VISITOR', 'VISITOR_PTS', 'HOME', 'HOME_PTS']
It looks like the dataframe now includes an "ARENA" column, so the corrected line should be
df.columns = ['DATE', 'VISITOR', 'VISITOR_PTS', 'HOME', 'HOME_PTS', `ARENA`]
EDIT: I posted a PR to address this here.
Attempting to get the box score for the 2012 all-star game returns an error about out of range indexing for a player's last name
Code to reproduce:
from basketball_reference_scraper.box_score import get_all_star_box_score
print(get_all_star_box_score(2012))
Traceback (most recent call last):
File "/home/jd/git/basketball_reference_scraper/test/test_box_scores.py", line 25, in test_get_all_star_box_score
d = get_all_star_box_score(2012)
File "/home/jd/git/basketball_reference_scraper/basketball_reference_scraper/box_scores.py", line 88, in get_all_star_box_score
stats_df = get_stats(dnp, ask_matches=False)
File "/home/jd/git/basketball_reference_scraper/basketball_reference_scraper/players.py", line 14, in get_stats
suffix = get_player_suffix(name)
File "/home/jd/git/basketball_reference_scraper/basketball_reference_scraper/utils.py", line 90, in get_player_suffix
initial = last_name_part[0].lower()
IndexError: string index out of range
Hello,
I have the error when inputting:
from basketball_reference_scraper.drafts import get_draft_class
ModuleNotFoundError: No module named 'basketball_reference_scraper.drafts'
I tested with Anaconda but also using Google colab. Other imports from the tutorial page works without error.
Thanks for looking into this,
Regards
A normal call like get_game_logs('Thabo Sefolosha', '2013-08-01', '2014-02-02') will return the correct game logs, whereas for some players eg. get_game_logs('DeMarcus Cousins', '2013-08-01', '2014-02-02') does not return anything. Also happens for random other players (Patty Mills, Gerald Henderson, etc.). May be due to the bball-ref widget function
code works perfectly fine on Latin names, but when I try to get names with ć, the code fails
I tried to get the data on "Nikola Jokic" and after that returned empty, I tried to get the data with the name saved at the same data base. than the code failed.
code fails in line 29 in the file utils.py
this is my code:
d = get_box_scores('2020-01-06', 'DEN', 'ATL')
niko = d['DEN']['PLAYER'][0]
temp = get_game_logs(niko, "2019-10-10", "2020-10-10", playoffs=False)
When I try getting Dario Šarić's stats, I get the following error:
Traceback (most recent call last):
File "C:\Users\Alexis\Downloads\winter2020\eecs497\algo.py", line 37, in
print(get_stats("Dario Šarić"))
File "C:\Program Files\Python38\Lib\site-packages\basketball_reference_scraper\players.py", line 11, in get_stats
suffix = get_player_suffix(name).replace('/', '%2F')
AttributeError: 'NoneType' object has no attribute 'replace'
What format do names have to be in?
Previous 2020 schedule does not seem to be working. Keep up the great work!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.