vishaalagartha / basketball_reference_scraper Goto Github PK

View Code? Open in Web Editor NEW

249.0 249.0 87.0 203 KB

A python module for scraping static and dynamic content from Basketball Reference.

License: MIT License

Python 100.00%

basketball_reference_scraper's People

Contributors

Stargazers

Watchers

Forkers

asharma96 valrcs biberst3 jlin37 ivar-blessing robertjoy95 tspen ju-dominguez howardwang0620 anilkay blipblipgo tomkennedy22 mationai taoprajjwal xinruili07 whoedward alejandrocantu adoreste319 sammeyerson ekkela geragom raiavincent dhilon cgmiracle xazsanch gracejz rdraward akshajk quintonkr arslanamir8 steelejj reece323 jpag2409 jlev8327 mcoirad nickrinaldi88 jsnider3 wangsix jonathanguerne gianmarcofolchi ces-d samuelli27 lucasz-hu ryanapierce kennethpham brian-mulei andresq47 fedderw cianoo45 fordfishman dannyjgibson rylan12 rudiejd klassy1016 rphillip kinglouie3 keyzion ashwin153 sparacinoj scalfjm uvacoder beaverbray justinbt21 aaroncolesmith gabeschoenbach alamine53 remihelleboid chacotton jasonharris438 gabrielpastorello louheb mirandarosalise tgracin spark-luc aulres2582 00mjk bradysiegel chrisli3064 psychoj3000 jjnotjimmyjohn zhihany reaxet maffew0412 5l1v3r1 rgluskin stephen-bosak

basketball_reference_scraper's Issues

can't install using pip3

The error i'm running into:

ERROR: Command errored out with exit status 1:
command: /Library/Developer/CommandLineTools/usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/tmp/pip-install-adjqudls/pandas/setup.py'"'"'; file='"'"'/private/tmp/pip-install-adjqudls/pandas/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
cwd: /private/tmp/pip-install-adjqudls/pandas/
Complete output (101 lines):
Processing numpy/random/_bounded_integers.pxd.in
Processing numpy/random/_philox.pyx
Traceback (most recent call last):
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 59, in process_pyx
from Cython.Compiler.Version import version as cython_version
ModuleNotFoundError: No module named 'Cython'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 235, in <module>
    main()
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 231, in main
    find_process_files(root_dir)
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 222, in find_process_files
    process(root_dir, fromfile, tofile, function, hash_db)
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 188, in process
    processor_function(fromfile, tofile)
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 64, in process_pyx
    raise OSError('Cython needs to be installed in Python as a module')
OSError: Cython needs to be installed in Python as a module
Running from numpy source directory.
/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py:485: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
  run_build = parse_setuppy_commands()
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 154, in save_modules
    yield saved
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 195, in setup_context
    yield
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 250, in run_setup
    _execfile(setup_script, ns)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 45, in _execfile
    exec(code, globals, locals)
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 513, in <module>
    #  we can't do anything about these warnings because they stem from
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 493, in setup_package

  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 290, in generate_cython
    for pxifile in _pxifiles:
RuntimeError: Running cythonize failed!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/private/tmp/pip-install-adjqudls/pandas/setup.py", line 809, in <module>
    setup(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/__init__.py", line 144, in setup
    _install_setup_requires(attrs)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/__init__.py", line 139, in _install_setup_requires
    dist.fetch_build_eggs(dist.setup_requires)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/dist.py", line 716, in fetch_build_eggs
    resolved_dists = pkg_resources.working_set.resolve(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 780, in resolve
    dist = best[req.key] = env.best_match(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1065, in best_match
    return self.obtain(req, installer)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1077, in obtain
    return installer(requirement)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/dist.py", line 786, in fetch_build_egg
    return cmd.easy_install(req)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 679, in easy_install
    return self.install_item(spec, dist.location, tmpdir, deps)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 705, in install_item
    dists = self.install_eggs(spec, download, tmpdir)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 890, in install_eggs
    return self.build_and_install(setup_script, setup_base)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 1158, in build_and_install
    self.run_setup(setup_script, setup_base, args)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 1144, in run_setup
    run_setup(setup_script, args)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 253, in run_setup
    raise
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 195, in setup_context
    yield
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 166, in save_modules
    saved_exc.resume()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 141, in resume
    six.reraise(type, exc, self._tb)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/_vendor/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 154, in save_modules
    yield saved
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 195, in setup_context
    yield
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 250, in run_setup
    _execfile(setup_script, ns)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 45, in _execfile
    exec(code, globals, locals)
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 513, in <module>
    #  we can't do anything about these warnings because they stem from
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 493, in setup_package

  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 290, in generate_cython
    for pxifile in _pxifiles:
RuntimeError: Running cythonize failed!
Cythonizing sources
----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

^ the error i'm running into trouble with

Nothing is working

I followed the code in the example and I keep getting the error, "No Tables"

Retrieving per game stats for 'Clint Capela' leads to AttributeError: 'NoneType' object has no attribute 'replace'

Running a fresh pip install of this project inside a jupyter notebook.

from basketball_reference_scraper.players import get_stats, get_game_logs _stats = get_stats('Clint Capela', stat_type='PER_GAME', playoffs=False, career=False, ask_matches=False)

Produces this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-7b87a5c104b7> in <module>
      1 from basketball_reference_scraper.players import get_stats, get_game_logs
----> 2 _stats = get_stats('Clint Capela', stat_type='PER_GAME', playoffs=False, career=False, ask_matches=False)

/opt/anaconda3/lib/python3.8/site-packages/basketball_reference_scraper/players.py in get_stats(_name, stat_type, playoffs, career, ask_matches)
     12 def get_stats(_name, stat_type='PER_GAME', playoffs=False, career=False, ask_matches = True):
     13     name = lookup(_name, ask_matches)
---> 14     suffix = get_player_suffix(name).replace('/', '%2F')
     15     selector = stat_type.lower()
     16     if playoffs:

AttributeError: 'NoneType' object has no attribute 'replace'

This code works with other players like 'nikola jokic'.

No module named 'constants'

Note: I just installed this and had to install lxml with lxml-4.4.2-cp37-cp37m-win_amd64.whl using pip install manually.
Here is the traceback, it seems like it can't find constants.py for some reason:

Traceback (most recent call last):
  File "C:\Users\yishi\PycharmProjects\baksetball-algo\venv\lib\site-packages\basketball_reference_scraper\teams.py", line 6, in <module>
    from constants import TEAM_TO_TEAM_ABBR, TEAM_SETS
ModuleNotFoundError: No module named 'constants'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/yishi/PycharmProjects/baksetball-algo/scraper.py", line 1, in <module>
    from basketball_reference_scraper.teams import get_roster, get_team_stats, get_opp_stats, get_roster_stats, get_team_misc
  File "C:\Users\yishi\PycharmProjects\baksetball-algo\venv\lib\site-packages\basketball_reference_scraper\teams.py", line 10, in <module>
    from basketball_reference_scraper.utils import remove_accents
  File "C:\Users\yishi\PycharmProjects\baksetball-algo\venv\lib\site-packages\basketball_reference_scraper\utils.py", line 4, in <module>
    import unicodedata, unidecode
ModuleNotFoundError: No module named 'unidecode'

Missing Charlotte Hornets CHH

Hello vishaalagartha,

Thanks for the very helpful API.

Just something to bring to your attention.

Charlotte Hornets were founded in 1988 and change their name in 2004 in Charlotte Bobcats until 2014 when they change it back to Charlotte Hornets.

However Basketball Reference has 2 different short names for Charlotte Hornets 1988-04 and Charlotte Hornets 2004-present.
These two names are:

Charlotte Hornets 1988-04: CHH
Charlotte Hornets 2004-present: CHO

Using the CHO for any stats before 2004 will result in an error.

Thanks again!

API requiring pandas==0.25.3. Possible to be compatible with pandas 1.1.1?

This is regarding displaying dataframe to frontend HTML page via flask.
The function to_html is not able to be utilized due to previous version of pandas this API uses.

basketball-reference-scraper 1.0.2 requires pandas==0.25.3, but you'll have pandas 1.1.1 which is incompatible.

closed

get_team_misc(team, 2021)

Team Misc. stats are correct on the website the day after games but is not scrapping correctly.
Missing the newest games.
Here are the Celtics Team and Opponent Stats and Misc. as of 1:54 PM 1/26/2021. (Day after game 16).

This is what I am getting for output. Notice how team and opponent per game stats are up to date but misc. is not.

Get schedule return the wrong time period for playoffs.

When you run'
s = get_schedule(2019, playoffs = True)

You dont get the playoffs games past 2019-05-30. Its the same for all other season

Add Player Injuries Endpoint

Scrape data from the following endpoint:
https://www.basketball-reference.com/friv/injuries.fcgi

Return a data frame containing the following columns:

['PLAYER', 'TEAM', 'DATE', 'DESCRIPTION']

get_roster() and get_roster_stats() don't work for older teams with abbreviations that are used by current teams

get_roster() and get_roster_stats() return None for teams that use abbreviations that are also used by current teams.

Example- get_roster('CHI',1949) or get_roster_stats('WAS',1985) return None but abbreviations for which there are no current teams work fine - like get_roster('KCK',1974) .

Also other functions like get_team_stats('CHI',1950) or get_team_misc('WAS',1985) work fine.

Can't set ask_matches = False.

For get_stats in players.py. If I do get_stats('Stephen Curry', 'PER_GAME', False, False, False) I get the error that get_stats only takes 1 to 4 arguments. If i specify ask_matches= False, I get the error that ask_matches is an unexpected keyword argument. Is there anyway to turn the option ask_matches off?

Any plan to add more endpoint like advanced?

Just wondering if there are any plans to keep adding tables/endpoints like Advanced? Great module regardless!

Identify players with same name or same name different suffix

I had trouble scraping Jaren Jackson Jr's data using function get_stats( ), I think it returns his dad. I also tried to input 'Jaren Jackson Jr.' but it returns the same result, not sure if you remove the suffix in the source code. Any workaround? Thanks

Player Game Logs Pandas Index numbering is incorrect

Recreation:
pg = get_game_logs('Pau Gasol', '2010-01-12', '2010-01-20', playoffs=False)
pg.loc[18:22,:] # return index of Pau's 18-22nd game for the season

Expectation:
Code should output the 4 games on 2009-12-06, 2009-12-09, 2009-12-11, and 2009-12-12 numbered as games 18,19,20,21 (offset by one because of 0-indexing) as seen here.

Reality:
Running the code above has those 4 games numbered as 18, 19, 21, 22. It appears that your program correctly skips the line that BBref uses as the header but still increments the index.

get_team_stats error

How I'm using it:

get_team_stats('UTA', year, data_format='PER_GAME').to_frame().transpose()

Error I'm getting:

\site-packages\basketball_reference_scraper\teams.py:42: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
s['SEASON'] = f'{season_end_year-1}-{str(season_end_year)[2:]}'

Missing data

I would attempt to get stats from a team's roster and columns would be missing and replaced by a column of "..." (no matter how many columns were missing, it would be replaced by a singular column).

Player names with accents not rendering properly

It could be how I'm dealing with the data, but some player names (Eg. "Kristaps Porziņģis", "Luka Dončić" are coming back with strange characters in place of the characters with accents on them when using get_roster() (Luka DonÄiÄ‡, for example). I grabbed the function from here and tried messing around with the encoding and decoding, but no luck. I'm sure there's a workaround for this but I just can't figure it out.

Full 2020 Season Schedule Not Displaying

When I use the function get_schedule(2020, playoffs=False), it returns the schedule from the season opener through 2019-10-31 but no games afterward. This seems to be the only season in which this is the case.

Thanks!

Add player stats per game

For example, a way to get Stephen Curry's statlines for each game he played in in 2018.

get_schedule() can't find result for 1953 and 1971 (season+playoffs)

from basketball_reference_scraper.seasons import get_schedule, get_standings
schedule1=get_schedule(1971,playoffs=False)

ValueError Traceback (most recent call last)
in
----> 1 schedule1=get_schedule(1971,playoffs=False)

/opt/anaconda3/lib/python3.7/site-packages/basketball_reference_scraper/seasons.py in get_schedule(season, playoffs)
16 soup = BeautifulSoup(r.content, 'html.parser')
17 table = soup.find('table', attrs={'id': 'schedule'})
---> 18 month_df = pd.read_html(str(table))[0]
19 df = df.append(month_df)
20 df = df.reset_index()

/opt/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only)
1103 na_values=na_values,
1104 keep_default_na=keep_default_na,
-> 1105 displayed_only=displayed_only,
1106 )

/opt/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, **kwargs)
910 break
911 else:
--> 912 raise_with_traceback(retained)
913
914 ret = []

/opt/anaconda3/lib/python3.7/site-packages/pandas/compat/init.py in raise_with_traceback(exc, traceback)
45 if traceback == Ellipsis:
46 _, _, traceback = sys.exc_info()
---> 47 raise exc.with_traceback(traceback)
48
49

ValueError: No tables found

team roster per 36 values missing

After fiddling around with the web scraper for a while, I noticed that while using the get_roster_stats function, setting the data format variable to 'PER_36' returns the following:

Traceback (most recent call last):
File "<pyshell#6>", line 1, in
get_roster_stats('GSW', 2019, 'PER_36', False)
File "C:\Program Files\Python38\Lib\site-packages\basketball_reference_scraper\teams.py", line 98, in get_roster_stats
df2 = pd.read_html(str(table))[0]
File "C:\Program Files\Python38\Lib\site-packages\pandas\io\html.py", line 1090, in read_html
return _parse(
File "C:\Program Files\Python38\Lib\site-packages\pandas\io\html.py", line 912, in parse
raise_with_traceback(retained)
File "C:\Program Files\Python38\Lib\site-packages\pandas\compat_init.py", line 47, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found

However, it is well known that basketball-reference indeed has these statistics on their website--is there a fix for this?

get_box_scores returning same player name multiple times or incorrect player name

box_scores.get_box_scores is returning the same player name multiple times on the same team or opponent and in some cases the player names are incorrect. The stats seem ok, just the player names. This seems to occur in many of the box scores I have looked at. Here are a few examples:

2018-11-30, ORL, PHO, GAME, BASIC - Aaron Gordon returned as himself as part of ORL and instead of De'Anthony Melton for PHO

2021-04-22, LAL, DAL, GAME, BASIC - Alex Caruso returned as himself as part of LAL and instead of Luka Dončić and J.J. Reddick for DAL. Looking deeper at this game, several of the DAL player names are incorrect.

playoff games logs error on function call

I tried get_game_logs('LeBron James', '2003-08-01', '2020-02-02', True)

and received the follwing stacktrace:
Traceback (most recent call last): File "<input>", line 1, in <module> File "/Users/kwaku/Development/nba-data/venv/lib/python3.7/site-packages/basketball_reference_scraper/players.py", line 65, in get_game_logs if len(row['GS'])>1: TypeError: object of type 'int' has no len()

function works when parameter playoffs=False

Error with get_stats and get_player_suffix

get_stats in players.py and get_player_suffix in utils.py seem to run into problems when they're either used with a player name with non-traditional characters or when used with names shared with other NBA players. The former causes no dataframes to be returned because it cannot construct a valid suffix and the latter causes a dataframe of the wrong player to be returned.

For example, the API sees no difference between Tim Hardaway Sr. and Tim Hardaway Jr.
Another example, Luka Dončić, Dario Šarić, and others are not handled well by get_player_suffix, resulting in an error.

I have fixed it somewhat on my end by tailoring the code to my needs--however, do let me know if a future version of your API fixes this, thanks!

Remove (rather fix) accents in injury reports

Hello,

In box reports player's names with accents are fixed using utils.remove_accents, I guess the same should be done for injury reports. This could in principle be easily done, like

def get_injury_report():
    r = get(f'https://widgets.sports-reference.com/wg.fcgi?css=1&site=bbr&url=%2Ffriv%2Finjuries.fcgi&div=div_injuries')
    if r.status_code==200:
        soup = BeautifulSoup(r.content, 'html.parser')
        table = soup.find('table')
        df = pd.read_html(str(table))[0]
        df.rename(columns = {'Player': 'PLAYER', 'Team': 'TEAM', 'Update': 'DATE', 'Description': 'DESCRIPTION'}, inplace=True)
        df['TEAM'] = df['TEAM'].apply(lambda x: TEAM_TO_TEAM_ABBR[x.upper()])
        df['DATE'] = df['DATE'].apply(lambda x: pd.to_datetime(x))
        df['PLAYER'] = df.apply(lambda injury: remove_accents(injury.PLAYER, injury.TEAM, df['DATE'].max().year, axis=1)
        df['STATUS'] = df['DESCRIPTION'].apply(lambda x: x[:x.index('(')].strip())
        df['INJURY'] = df['DESCRIPTION'].apply(lambda x: x[x.index('(')+1:x.index(')')].strip())
        df['DESCRIPTION'] = df['DESCRIPTION'].apply(lambda x: x[x.index('-')+2:].strip())
        return df

However, I fear that might fail e.g. if someone got traded and there was no injury yet in the new year or so. What do you think?

Cheers :-)

ValueError: No Tables Found

I'm getting a an error for no tables found when I use get_box_scores, but only for the Washington Wizards vs. Philadelphia 76ers game from yesterday. Anyway I can prevent this or work around it? To further clarify, I used the team abbreviations of PHI and WAS and the date of 2020-12-23. Also the day before that, the box score for the Lakers and clippers game read Dennis Schroder as Danny green, not sure if a fix is required for that. Overall, a really great API that I've been using.

get_game_logs Functionality not what expected

The function get_game_logs('Kobe Bryant', '2010-01-12', '2010-01-20', playoffs=False) returns a 73 row pandas dataframe that has every game from 2009-10-27 to 2010-04-11.

Expectation: To return only game logs of Kobe that were between the dates provided, inclusively.

A similar (incorrect) result is seen when inputting:
get_game_logs('Pau Gasol', '2010-01-12', '2010-01-20', playoffs=False)
get_game_logs('Kobe Bryant', '2010-01-12', '2011-01-20', playoffs=False)

get_stats() function Dummy values result in NaN

The following has two dummy stats, which cause later fields to be off. And the empty dummies result in a NaN (not a number)

<tr><th class="left " data-stat="season" scope="row">1 season</th><td class="center iz" data-stat="age"></td><td class="left " data-stat="team_id"><a href="https://www.basketball-reference.com/teams/IND/">IND</a></td><td class="left " data-stat="lg_id">NBA</td><td class="center iz" data-stat="pos"></td><td class="right " data-stat="g">65</td><td class="right " data-stat="mp">894</td><td class="right " data-stat="per">9.5</td><td class="right " data-stat="ts_pct">.507</td><td class="right " data-stat="fg3a_per_fga_pct">.427</td><td class="right " data-stat="fta_per_fga_pct">.133</td><td class="right " data-stat="orb_pct">2.3</td><td class="right " data-stat="drb_pct">6.3</td><td class="right " data-stat="trb_pct">4.3</td>
<td class="right " data-stat="ast_pct">20.7</td><td class="right " data-stat="stl_pct">1.4</td><td class="right " data-stat="blk_pct">0.9</td><td class="right " data-stat="tov_pct">17.2</td><td class="right " data-stat="usg_pct">19.2</td><td class="right iz" data-stat="DUMMY"></td><td class="right " data-stat="ows">-0.2</td><td class="right " data-stat="dws">0.5</td><td class="right " data-stat="ws">0.3</td><td class="right " data-stat="ws_per_48">.016</td><td class="right iz" data-stat="DUMMY"></td><td class="right " data-stat="obpm">-2.4</td><td class="right " data-stat="dbpm">-1.2</td><td class="right " data-stat="bpm">-3.7</td><td class="right " data-stat="vorp">-0.4</td></tr>

For example, (This specific example is for Aaron Brooks)

['2016-17', 32.0, 'IND', 'NBA', 'PG', 65.0, 894.0, 9.5, 0.507, 0.427, 0.133, 2.3, 6.3, 4.3, 20.7, 1.4, 0.9, 17.2, 19.2, nan, -0.2, 0.5, 0.3, 0.016, nan, -2.4, -1.2, -3.7, -0.4]

This happens for every single player, and the dummy may be in different locations.

Can't get game logs for 2020-2021 season

When running the following code:

game = get_game_logs('Chris Paul', '2020-12-22', '2020-12-30', playoffs=False)
print(game)

The output is:

Results for Chris Paul:

Empty DataFrame
Columns: [DATE, AGE, TEAM, HOME/AWAY, OPPONENT, RESULT, GS, MP, FG, FGA, FG%, 3P, 3PA, 3P%, FT, FTA, FT%, ORB, DRB, TRB, AST, STL, BLK, TOV, PF, PTS, GAME_SCORE, +/-]
Index: []

I can pull game logs from previous seasons with no problem using the method above. Please let me know if I am not using the function correctly.

EDIT: I was able to get Blake Griffin's and Jamal Murray's 2020-2021 game log, but haven't been able to obtain any others for this season yet.

Can't get playoff game logs for players to work

Hi all,

Bear with me, I'm very novice with Python and instead of finding out how to scrape Basketball-Reference myself I figured why not try this module first.

The game logs for regular season games seem to work fine using get_game_logs('LaMarcus Aldridge', '2013-04-20', '2017-08-01', playoffs=False), but if I want to look at playoff game logs and do get_game_logs('LaMarcus Aldridge', '2013-04-20', '2017-08-01', playoffs=True) I get the following error on the first try:

C:\Users\myname\Anaconda3\lib\site-packages\pandas\core\ops_init_.py:1115: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
result = method(y)

What I get is an empty Pandas Dataframe. I also tried this with Damian Lillard and it also didn't work.

Am I doing something wrong?

P.S I tried get_game_logs('LeBron James', '2006-04-01', '2020-11-01', True) and this seems to work partially, I get a Dataframes containing the latest game logs, the only problem is that all game logs before 2011 are missing.

Needs unidecode to work properly

This package depends on unidecode to work properly, yet doesn't list unidecode as a requirement. I'll try to submit a fix as a pull request later today, but I'm fine if someone else does it.

lookup with ask_matches = False doesn't return closest name

when ask_matches = False the lookup function needs to sort the list like it does when it's true and the list is > 1:
matches.sort(key=lambda tup: tup[1])

Duplicate player names returns only the first match

In utils.py get_player_suffix, only the first player name match is returned, which fails when there are players with the same names, for example, Dee Brown:

https://www.basketball-reference.com/players/b/brownde01.html
https://www.basketball-reference.com/players/b/brownde03.html

You might add a get_players method which returns a list of dfs for each player that matches, since there aren't that many players with duplicates at the moment (by my count, there are only doubles, not triples, and <20 in the past 30 years).

Players with double spaced names are not returned, or the wrong player's data is returned.

Bug: when there are 2 players with the same name- it always return the players with suffix 01

try to use the player 'Lary Nance'.
the returned data is always for this page:
https://www.basketball-reference.com/players/n/nancela01.html

although im looking for this page:
https://www.basketball-reference.com/players/n/nancela02.html

get_schedule raises a ValueError if 2020 is used for season.

play_by_play

play_by_play returns an error when trying the replit example in documentation
full stack trace below

Traceback (most recent call last):
File "main.py", line 6, in
print(client.play_by_play(home_team=Team.BOSTON_CELTICS, year=2018, month=10, day=16))
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/basketball_reference_web_scraper/client.py", line 220, in play_by_play
values = http_service.play_by_play(home_team=home_team, day=day, month=month, year=year)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/basketball_reference_web_scraper/http_service.py", line 106, in play_by_play
away_team_name=page.away_team_name,
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/basketball_reference_web_scraper/html.py", line 791, in away_team_name
return self.team_names[0]
IndexError: list index out of range

get_team_misc(team, 2021)

This is not scraping the most up to date misc. stats for each team. Looks like it is grabbing a day behind's stats

get_player_headshot doesn't have a ask_matches option

Can you please add this :)

Parameters 'RANK' and 'Y/Y' return error

Various functions state that the data format can be 'RANK' or 'Y/Y'; however, when ran as such it returns either 'ValueError: No tables found' or 'UnboundLocalError: local variable 'selector' referenced before assignment'. What is the proper usage for RANK and what is the 'Y/Y' data format?

`get_roster` fails when a player's nationality is missing

Hi @ vishaalagartha,

Thanks for this package! I ran into an issue while using the basketball_reference_scraper.teams.get_roster function.

Error

Here's the code to produce the error:

from basketball_reference_scraper.teams import get_roster

get_roster("FTW", 1956)

Here's the error:

~/.pyenv/versions/3.8.5/envs/global/lib/python3.8/site-packages/basketball_reference_scraper/teams.py in <lambda>(x)
     21         df['PLAYER'] = df['PLAYER'].apply(lambda name: remove_accents(name, team, season_end_year))
     22         df['BIRTH_DATE'] = df['BIRTH_DATE'].apply(lambda x: pd.to_datetime(x))
---> 23         df['NATIONALITY'] = df['NATIONALITY'].apply(lambda x: x.upper())
     24     return df
     25 

AttributeError: 'float' object has no attribute 'upper'

If you look at the page for FTW, 1956, you'll see there's no nationality listed for Chuck Noble. The resulting value is then NaN, which doesn't have an upper function.

Solution

Using the built-in pd.DataFrame.str method should address this issue:

df['NATIONALITY'] = df['NATIONALITY'].str.upper()

get_roster crashes when NaN name appears in the team's roster table

Miami Heat 2021 roster is given here. The table contains an empty row which causes the get_roster to crash because remove_accents(name, team, season_end_year) finds a NaN instead of a string. One solution would be to remove the nan Players in the get_roster one line above, e.g.

        df = df[df['PLAYER'].notna()]
        df['PLAYER'] = df['PLAYER'].apply(lambda name: remove_accents(name, team, season_end_year))

it should also be combined with another update on utils.remove_accents, for example line 54 should be

matches = sum(l1 == l2 for l1, l2 in zip(p, name)) if pd.notna(p) else 0

by the way, what is the utility of remove_accents() function?

get_schedule() returns a ValueError: Length mismatch

Running

schedule = get_schedule(season, playoffs=True)

Gives a ValueError: Length mismatch: Expected axis has 6 elements, new values have 5 elements due to the following line in seasons.py (line 28):

df.columns = ['DATE', 'VISITOR', 'VISITOR_PTS', 'HOME', 'HOME_PTS']

It looks like the dataframe now includes an "ARENA" column, so the corrected line should be

df.columns = ['DATE', 'VISITOR', 'VISITOR_PTS', 'HOME', 'HOME_PTS', `ARENA`]

EDIT: I posted a PR to address this here.

get_all_star_box_score(2012) returns error

Attempting to get the box score for the 2012 all-star game returns an error about out of range indexing for a player's last name

Code to reproduce:

from basketball_reference_scraper.box_score import get_all_star_box_score

print(get_all_star_box_score(2012))

Traceback (most recent call last):
  File "/home/jd/git/basketball_reference_scraper/test/test_box_scores.py", line 25, in test_get_all_star_box_score
    d = get_all_star_box_score(2012)
  File "/home/jd/git/basketball_reference_scraper/basketball_reference_scraper/box_scores.py", line 88, in get_all_star_box_score
    stats_df = get_stats(dnp, ask_matches=False)
  File "/home/jd/git/basketball_reference_scraper/basketball_reference_scraper/players.py", line 14, in get_stats
    suffix = get_player_suffix(name)
  File "/home/jd/git/basketball_reference_scraper/basketball_reference_scraper/utils.py", line 90, in get_player_suffix
    initial = last_name_part[0].lower()
IndexError: string index out of range

from basketball_reference_scraper.drafts import get_draft_class - No module named 'basketball_reference_scraper.drafts'

Hello,
I have the error when inputting:
from basketball_reference_scraper.drafts import get_draft_class
ModuleNotFoundError: No module named 'basketball_reference_scraper.drafts'

I tested with Anaconda but also using Google colab. Other imports from the tutorial page works without error.

Thanks for looking into this,
Regards

get game logs not working as expected

A normal call like get_game_logs('Thabo Sefolosha', '2013-08-01', '2014-02-02') will return the correct game logs, whereas for some players eg. get_game_logs('DeMarcus Cousins', '2013-08-01', '2014-02-02') does not return anything. Also happens for random other players (Patty Mills, Gerald Henderson, etc.). May be due to the bball-ref widget function

can't get data on non Latin names like Nikola Jokić

code works perfectly fine on Latin names, but when I try to get names with ć, the code fails
I tried to get the data on "Nikola Jokic" and after that returned empty, I tried to get the data with the name saved at the same data base. than the code failed.
code fails in line 29 in the file utils.py
this is my code:

d = get_box_scores('2020-01-06', 'DEN', 'ATL')
niko = d['DEN']['PLAYER'][0]
temp = get_game_logs(niko, "2019-10-10", "2020-10-10", playoffs=False)

How does get_stats handle players with uncommon names?

When I try getting Dario Šarić's stats, I get the following error:

Traceback (most recent call last):
File "C:\Users\Alexis\Downloads\winter2020\eecs497\algo.py", line 37, in
print(get_stats("Dario Šarić"))
File "C:\Program Files\Python38\Lib\site-packages\basketball_reference_scraper\players.py", line 11, in get_stats
suffix = get_player_suffix(name).replace('/', '%2F')
AttributeError: 'NoneType' object has no attribute 'replace'

What format do names have to be in?

2020 Schedule

Previous 2020 schedule does not seem to be working. Keep up the great work!