Code Monkey home page Code Monkey logo

basketball_reference_scraper's People

Contributors

aaroncolesmith avatar adoreste319 avatar ashwin153 avatar bellmatthewf avatar ces-d avatar dannyjgibson avatar dependabot[bot] avatar dhilon avatar diego-escobedo avatar gianmarcofolchi avatar gracejz avatar jjnotjimmyjohn avatar johnwmillr avatar jsnider3 avatar justinbt21 avatar kennethpham avatar lucasz-hu avatar rdraward avatar rudiejd avatar sammeyerson avatar tspen avatar vishaalagartha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

basketball_reference_scraper's Issues

can't install using pip3

The error i'm running into:

ERROR: Command errored out with exit status 1:
command: /Library/Developer/CommandLineTools/usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/tmp/pip-install-adjqudls/pandas/setup.py'"'"'; file='"'"'/private/tmp/pip-install-adjqudls/pandas/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
cwd: /private/tmp/pip-install-adjqudls/pandas/
Complete output (101 lines):
Processing numpy/random/_bounded_integers.pxd.in
Processing numpy/random/_philox.pyx
Traceback (most recent call last):
File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 59, in process_pyx
from Cython.Compiler.Version import version as cython_version
ModuleNotFoundError: No module named 'Cython'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 235, in <module>
    main()
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 231, in main
    find_process_files(root_dir)
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 222, in find_process_files
    process(root_dir, fromfile, tofile, function, hash_db)
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 188, in process
    processor_function(fromfile, tofile)
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/tools/cythonize.py", line 64, in process_pyx
    raise OSError('Cython needs to be installed in Python as a module')
OSError: Cython needs to be installed in Python as a module
Running from numpy source directory.
/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py:485: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
  run_build = parse_setuppy_commands()
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 154, in save_modules
    yield saved
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 195, in setup_context
    yield
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 250, in run_setup
    _execfile(setup_script, ns)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 45, in _execfile
    exec(code, globals, locals)
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 513, in <module>
    #  we can't do anything about these warnings because they stem from
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 493, in setup_package

  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 290, in generate_cython
    for pxifile in _pxifiles:
RuntimeError: Running cythonize failed!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/private/tmp/pip-install-adjqudls/pandas/setup.py", line 809, in <module>
    setup(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/__init__.py", line 144, in setup
    _install_setup_requires(attrs)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/__init__.py", line 139, in _install_setup_requires
    dist.fetch_build_eggs(dist.setup_requires)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/dist.py", line 716, in fetch_build_eggs
    resolved_dists = pkg_resources.working_set.resolve(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 780, in resolve
    dist = best[req.key] = env.best_match(
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1065, in best_match
    return self.obtain(req, installer)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1077, in obtain
    return installer(requirement)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/dist.py", line 786, in fetch_build_egg
    return cmd.easy_install(req)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 679, in easy_install
    return self.install_item(spec, dist.location, tmpdir, deps)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 705, in install_item
    dists = self.install_eggs(spec, download, tmpdir)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 890, in install_eggs
    return self.build_and_install(setup_script, setup_base)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 1158, in build_and_install
    self.run_setup(setup_script, setup_base, args)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/command/easy_install.py", line 1144, in run_setup
    run_setup(setup_script, args)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 253, in run_setup
    raise
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 195, in setup_context
    yield
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 166, in save_modules
    saved_exc.resume()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 141, in resume
    six.reraise(type, exc, self._tb)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/_vendor/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 154, in save_modules
    yield saved
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 195, in setup_context
    yield
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 250, in run_setup
    _execfile(setup_script, ns)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/site-packages/setuptools/sandbox.py", line 45, in _execfile
    exec(code, globals, locals)
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 513, in <module>
    #  we can't do anything about these warnings because they stem from
  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 493, in setup_package

  File "/tmp/easy_install-3lddwx71/numpy-1.20.2/setup.py", line 290, in generate_cython
    for pxifile in _pxifiles:
RuntimeError: Running cythonize failed!
Cythonizing sources
----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

^ the error i'm running into trouble with

Nothing is working

I followed the code in the example and I keep getting the error, "No Tables"

Retrieving per game stats for 'Clint Capela' leads to AttributeError: 'NoneType' object has no attribute 'replace'

Running a fresh pip install of this project inside a jupyter notebook.

from basketball_reference_scraper.players import get_stats, get_game_logs _stats = get_stats('Clint Capela', stat_type='PER_GAME', playoffs=False, career=False, ask_matches=False)

Produces this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-7b87a5c104b7> in <module>
      1 from basketball_reference_scraper.players import get_stats, get_game_logs
----> 2 _stats = get_stats('Clint Capela', stat_type='PER_GAME', playoffs=False, career=False, ask_matches=False)

/opt/anaconda3/lib/python3.8/site-packages/basketball_reference_scraper/players.py in get_stats(_name, stat_type, playoffs, career, ask_matches)
     12 def get_stats(_name, stat_type='PER_GAME', playoffs=False, career=False, ask_matches = True):
     13     name = lookup(_name, ask_matches)
---> 14     suffix = get_player_suffix(name).replace('/', '%2F')
     15     selector = stat_type.lower()
     16     if playoffs:

AttributeError: 'NoneType' object has no attribute 'replace'

This code works with other players like 'nikola jokic'.

No module named 'constants'

Note: I just installed this and had to install lxml with lxml-4.4.2-cp37-cp37m-win_amd64.whl using pip install manually.
Here is the traceback, it seems like it can't find constants.py for some reason:

Traceback (most recent call last):
  File "C:\Users\yishi\PycharmProjects\baksetball-algo\venv\lib\site-packages\basketball_reference_scraper\teams.py", line 6, in <module>
    from constants import TEAM_TO_TEAM_ABBR, TEAM_SETS
ModuleNotFoundError: No module named 'constants'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/yishi/PycharmProjects/baksetball-algo/scraper.py", line 1, in <module>
    from basketball_reference_scraper.teams import get_roster, get_team_stats, get_opp_stats, get_roster_stats, get_team_misc
  File "C:\Users\yishi\PycharmProjects\baksetball-algo\venv\lib\site-packages\basketball_reference_scraper\teams.py", line 10, in <module>
    from basketball_reference_scraper.utils import remove_accents
  File "C:\Users\yishi\PycharmProjects\baksetball-algo\venv\lib\site-packages\basketball_reference_scraper\utils.py", line 4, in <module>
    import unicodedata, unidecode
ModuleNotFoundError: No module named 'unidecode'

Missing Charlotte Hornets CHH

Hello vishaalagartha,

Thanks for the very helpful API.

Just something to bring to your attention.

Charlotte Hornets were founded in 1988 and change their name in 2004 in Charlotte Bobcats until 2014 when they change it back to Charlotte Hornets.

However Basketball Reference has 2 different short names for Charlotte Hornets 1988-04 and Charlotte Hornets 2004-present.
These two names are:

Charlotte Hornets 1988-04: CHH
Charlotte Hornets 2004-present: CHO

Using the CHO for any stats before 2004 will result in an error.

Thanks again!

get_team_misc(team, 2021)

Team Misc. stats are correct on the website the day after games but is not scrapping correctly.
Missing the newest games.
Here are the Celtics Team and Opponent Stats and Misc. as of 1:54 PM 1/26/2021. (Day after game 16).
image

This is what I am getting for output. Notice how team and opponent per game stats are up to date but misc. is not.
image

Can't set ask_matches = False.

For get_stats in players.py. If I do get_stats('Stephen Curry', 'PER_GAME', False, False, False) I get the error that get_stats only takes 1 to 4 arguments. If i specify ask_matches= False, I get the error that ask_matches is an unexpected keyword argument. Is there anyway to turn the option ask_matches off?

Identify players with same name or same name different suffix

I had trouble scraping Jaren Jackson Jr's data using function get_stats( ), I think it returns his dad. I also tried to input 'Jaren Jackson Jr.' but it returns the same result, not sure if you remove the suffix in the source code. Any workaround? Thanks

Player Game Logs Pandas Index numbering is incorrect

Recreation:
pg = get_game_logs('Pau Gasol', '2010-01-12', '2010-01-20', playoffs=False)
pg.loc[18:22,:] # return index of Pau's 18-22nd game for the season

Expectation:
Code should output the 4 games on 2009-12-06, 2009-12-09, 2009-12-11, and 2009-12-12 numbered as games 18,19,20,21 (offset by one because of 0-indexing) as seen here.
Bbref Expectation

Reality:
Running the code above has those 4 games numbered as 18, 19, 21, 22. It appears that your program correctly skips the line that BBref uses as the header but still increments the index.

get_team_stats error

How I'm using it:

get_team_stats('UTA', year, data_format='PER_GAME').to_frame().transpose()

Error I'm getting:

\site-packages\basketball_reference_scraper\teams.py:42: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
s['SEASON'] = f'{season_end_year-1}-{str(season_end_year)[2:]}'

Missing data

I would attempt to get stats from a team's roster and columns would be missing and replaced by a column of "..." (no matter how many columns were missing, it would be replaced by a singular column).
image

Player names with accents not rendering properly

It could be how I'm dealing with the data, but some player names (Eg. "Kristaps Porziņģis", "Luka Dončić" are coming back with strange characters in place of the characters with accents on them when using get_roster() (Luka DonÄić, for example). I grabbed the function from here and tried messing around with the encoding and decoding, but no luck. I'm sure there's a workaround for this but I just can't figure it out.

Full 2020 Season Schedule Not Displaying

When I use the function get_schedule(2020, playoffs=False), it returns the schedule from the season opener through 2019-10-31 but no games afterward. This seems to be the only season in which this is the case.

Thanks!

get_schedule() can't find result for 1953 and 1971 (season+playoffs)

from basketball_reference_scraper.seasons import get_schedule, get_standings
schedule1=get_schedule(1971,playoffs=False)

ValueError Traceback (most recent call last)
in
----> 1 schedule1=get_schedule(1971,playoffs=False)

/opt/anaconda3/lib/python3.7/site-packages/basketball_reference_scraper/seasons.py in get_schedule(season, playoffs)
16 soup = BeautifulSoup(r.content, 'html.parser')
17 table = soup.find('table', attrs={'id': 'schedule'})
---> 18 month_df = pd.read_html(str(table))[0]
19 df = df.append(month_df)
20 df = df.reset_index()

/opt/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only)
1103 na_values=na_values,
1104 keep_default_na=keep_default_na,
-> 1105 displayed_only=displayed_only,
1106 )

/opt/anaconda3/lib/python3.7/site-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, **kwargs)
910 break
911 else:
--> 912 raise_with_traceback(retained)
913
914 ret = []

/opt/anaconda3/lib/python3.7/site-packages/pandas/compat/init.py in raise_with_traceback(exc, traceback)
45 if traceback == Ellipsis:
46 _, _, traceback = sys.exc_info()
---> 47 raise exc.with_traceback(traceback)
48
49

ValueError: No tables found

team roster per 36 values missing

After fiddling around with the web scraper for a while, I noticed that while using the get_roster_stats function, setting the data format variable to 'PER_36' returns the following:

Traceback (most recent call last):
File "<pyshell#6>", line 1, in
get_roster_stats('GSW', 2019, 'PER_36', False)
File "C:\Program Files\Python38\Lib\site-packages\basketball_reference_scraper\teams.py", line 98, in get_roster_stats
df2 = pd.read_html(str(table))[0]
File "C:\Program Files\Python38\Lib\site-packages\pandas\io\html.py", line 1090, in read_html
return _parse(
File "C:\Program Files\Python38\Lib\site-packages\pandas\io\html.py", line 912, in parse
raise_with_traceback(retained)
File "C:\Program Files\Python38\Lib\site-packages\pandas\compat_init
.py", line 47, in raise_with_traceback
raise exc.with_traceback(traceback)
ValueError: No tables found

However, it is well known that basketball-reference indeed has these statistics on their website--is there a fix for this?

get_box_scores returning same player name multiple times or incorrect player name

box_scores.get_box_scores is returning the same player name multiple times on the same team or opponent and in some cases the player names are incorrect. The stats seem ok, just the player names. This seems to occur in many of the box scores I have looked at. Here are a few examples:

2018-11-30, ORL, PHO, GAME, BASIC - Aaron Gordon returned as himself as part of ORL and instead of De'Anthony Melton for PHO

2021-04-22, LAL, DAL, GAME, BASIC - Alex Caruso returned as himself as part of LAL and instead of Luka Dončić and J.J. Reddick for DAL. Looking deeper at this game, several of the DAL player names are incorrect.

playoff games logs error on function call

  • I tried get_game_logs('LeBron James', '2003-08-01', '2020-02-02', True)

and received the follwing stacktrace:
Traceback (most recent call last): File "<input>", line 1, in <module> File "/Users/kwaku/Development/nba-data/venv/lib/python3.7/site-packages/basketball_reference_scraper/players.py", line 65, in get_game_logs if len(row['GS'])>1: TypeError: object of type 'int' has no len()

  • function works when parameter playoffs=False

Error with get_stats and get_player_suffix

get_stats in players.py and get_player_suffix in utils.py seem to run into problems when they're either used with a player name with non-traditional characters or when used with names shared with other NBA players. The former causes no dataframes to be returned because it cannot construct a valid suffix and the latter causes a dataframe of the wrong player to be returned.

For example, the API sees no difference between Tim Hardaway Sr. and Tim Hardaway Jr.
Another example, Luka Dončić, Dario Šarić, and others are not handled well by get_player_suffix, resulting in an error.

I have fixed it somewhat on my end by tailoring the code to my needs--however, do let me know if a future version of your API fixes this, thanks!

Remove (rather fix) accents in injury reports

Hello,

In box reports player's names with accents are fixed using utils.remove_accents, I guess the same should be done for injury reports. This could in principle be easily done, like

def get_injury_report():
    r = get(f'https://widgets.sports-reference.com/wg.fcgi?css=1&site=bbr&url=%2Ffriv%2Finjuries.fcgi&div=div_injuries')
    if r.status_code==200:
        soup = BeautifulSoup(r.content, 'html.parser')
        table = soup.find('table')
        df = pd.read_html(str(table))[0]
        df.rename(columns = {'Player': 'PLAYER', 'Team': 'TEAM', 'Update': 'DATE', 'Description': 'DESCRIPTION'}, inplace=True)
        df['TEAM'] = df['TEAM'].apply(lambda x: TEAM_TO_TEAM_ABBR[x.upper()])
        df['DATE'] = df['DATE'].apply(lambda x: pd.to_datetime(x))
        df['PLAYER'] = df.apply(lambda injury: remove_accents(injury.PLAYER, injury.TEAM, df['DATE'].max().year, axis=1)
        df['STATUS'] = df['DESCRIPTION'].apply(lambda x: x[:x.index('(')].strip())
        df['INJURY'] = df['DESCRIPTION'].apply(lambda x: x[x.index('(')+1:x.index(')')].strip())
        df['DESCRIPTION'] = df['DESCRIPTION'].apply(lambda x: x[x.index('-')+2:].strip())
        return df

However, I fear that might fail e.g. if someone got traded and there was no injury yet in the new year or so. What do you think?

Cheers :-)

ValueError: No Tables Found

I'm getting a an error for no tables found when I use get_box_scores, but only for the Washington Wizards vs. Philadelphia 76ers game from yesterday. Anyway I can prevent this or work around it? To further clarify, I used the team abbreviations of PHI and WAS and the date of 2020-12-23. Also the day before that, the box score for the Lakers and clippers game read Dennis Schroder as Danny green, not sure if a fix is required for that. Overall, a really great API that I've been using.

Capture

get_game_logs Functionality not what expected

The function get_game_logs('Kobe Bryant', '2010-01-12', '2010-01-20', playoffs=False) returns a 73 row pandas dataframe that has every game from 2009-10-27 to 2010-04-11.

Expectation: To return only game logs of Kobe that were between the dates provided, inclusively.

A similar (incorrect) result is seen when inputting:
get_game_logs('Pau Gasol', '2010-01-12', '2010-01-20', playoffs=False)
get_game_logs('Kobe Bryant', '2010-01-12', '2011-01-20', playoffs=False)

get_stats() function Dummy values result in NaN

The following has two dummy stats, which cause later fields to be off. And the empty dummies result in a NaN (not a number)

<tr><th class="left " data-stat="season" scope="row">1 season</th><td class="center iz" data-stat="age"></td><td class="left " data-stat="team_id"><a href="https://www.basketball-reference.com/teams/IND/">IND</a></td><td class="left " data-stat="lg_id">NBA</td><td class="center iz" data-stat="pos"></td><td class="right " data-stat="g">65</td><td class="right " data-stat="mp">894</td><td class="right " data-stat="per">9.5</td><td class="right " data-stat="ts_pct">.507</td><td class="right " data-stat="fg3a_per_fga_pct">.427</td><td class="right " data-stat="fta_per_fga_pct">.133</td><td class="right " data-stat="orb_pct">2.3</td><td class="right " data-stat="drb_pct">6.3</td><td class="right " data-stat="trb_pct">4.3</td>
<td class="right " data-stat="ast_pct">20.7</td><td class="right " data-stat="stl_pct">1.4</td><td class="right " data-stat="blk_pct">0.9</td><td class="right " data-stat="tov_pct">17.2</td><td class="right " data-stat="usg_pct">19.2</td><td class="right iz" data-stat="DUMMY"></td><td class="right " data-stat="ows">-0.2</td><td class="right " data-stat="dws">0.5</td><td class="right " data-stat="ws">0.3</td><td class="right " data-stat="ws_per_48">.016</td><td class="right iz" data-stat="DUMMY"></td><td class="right " data-stat="obpm">-2.4</td><td class="right " data-stat="dbpm">-1.2</td><td class="right " data-stat="bpm">-3.7</td><td class="right " data-stat="vorp">-0.4</td></tr>

For example, (This specific example is for Aaron Brooks)

['2016-17', 32.0, 'IND', 'NBA', 'PG', 65.0, 894.0, 9.5, 0.507, 0.427, 0.133, 2.3, 6.3, 4.3, 20.7, 1.4, 0.9, 17.2, 19.2, nan, -0.2, 0.5, 0.3, 0.016, nan, -2.4, -1.2, -3.7, -0.4]

This happens for every single player, and the dummy may be in different locations.

Can't get game logs for 2020-2021 season

When running the following code:

game = get_game_logs('Chris Paul', '2020-12-22', '2020-12-30', playoffs=False)
print(game)

The output is:

Results for Chris Paul:

Empty DataFrame
Columns: [DATE, AGE, TEAM, HOME/AWAY, OPPONENT, RESULT, GS, MP, FG, FGA, FG%, 3P, 3PA, 3P%, FT, FTA, FT%, ORB, DRB, TRB, AST, STL, BLK, TOV, PF, PTS, GAME_SCORE, +/-]
Index: []

I can pull game logs from previous seasons with no problem using the method above. Please let me know if I am not using the function correctly.

EDIT: I was able to get Blake Griffin's and Jamal Murray's 2020-2021 game log, but haven't been able to obtain any others for this season yet.

Can't get playoff game logs for players to work

Hi all,

Bear with me, I'm very novice with Python and instead of finding out how to scrape Basketball-Reference myself I figured why not try this module first.

The game logs for regular season games seem to work fine using get_game_logs('LaMarcus Aldridge', '2013-04-20', '2017-08-01', playoffs=False), but if I want to look at playoff game logs and do get_game_logs('LaMarcus Aldridge', '2013-04-20', '2017-08-01', playoffs=True) I get the following error on the first try:

C:\Users\myname\Anaconda3\lib\site-packages\pandas\core\ops_init_.py:1115: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
result = method(y)

What I get is an empty Pandas Dataframe. I also tried this with Damian Lillard and it also didn't work.

Am I doing something wrong?

P.S I tried get_game_logs('LeBron James', '2006-04-01', '2020-11-01', True) and this seems to work partially, I get a Dataframes containing the latest game logs, the only problem is that all game logs before 2011 are missing.

Needs unidecode to work properly

This package depends on unidecode to work properly, yet doesn't list unidecode as a requirement. I'll try to submit a fix as a pull request later today, but I'm fine if someone else does it.

Duplicate player names returns only the first match

In utils.py get_player_suffix, only the first player name match is returned, which fails when there are players with the same names, for example, Dee Brown:

https://www.basketball-reference.com/players/b/brownde01.html
https://www.basketball-reference.com/players/b/brownde03.html

You might add a get_players method which returns a list of dfs for each player that matches, since there aren't that many players with duplicates at the moment (by my count, there are only doubles, not triples, and <20 in the past 30 years).

play_by_play

play_by_play returns an error when trying the replit example in documentation
full stack trace below

Traceback (most recent call last):
File "main.py", line 6, in
print(client.play_by_play(home_team=Team.BOSTON_CELTICS, year=2018, month=10, day=16))
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/basketball_reference_web_scraper/client.py", line 220, in play_by_play
values = http_service.play_by_play(home_team=home_team, day=day, month=month, year=year)
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/basketball_reference_web_scraper/http_service.py", line 106, in play_by_play
away_team_name=page.away_team_name,
File "/opt/virtualenvs/python3/lib/python3.8/site-packages/basketball_reference_web_scraper/html.py", line 791, in away_team_name
return self.team_names[0]
IndexError: list index out of range

get_team_misc(team, 2021)

get_team_misc(team, 2021)

This is not scraping the most up to date misc. stats for each team. Looks like it is grabbing a day behind's stats

Parameters 'RANK' and 'Y/Y' return error

Various functions state that the data format can be 'RANK' or 'Y/Y'; however, when ran as such it returns either 'ValueError: No tables found' or 'UnboundLocalError: local variable 'selector' referenced before assignment'. What is the proper usage for RANK and what is the 'Y/Y' data format?

`get_roster` fails when a player's nationality is missing

Hi @ vishaalagartha,

Thanks for this package! I ran into an issue while using the basketball_reference_scraper.teams.get_roster function.

Error

Here's the code to produce the error:

from basketball_reference_scraper.teams import get_roster

get_roster("FTW", 1956)

Here's the error:

~/.pyenv/versions/3.8.5/envs/global/lib/python3.8/site-packages/basketball_reference_scraper/teams.py in <lambda>(x)
     21         df['PLAYER'] = df['PLAYER'].apply(lambda name: remove_accents(name, team, season_end_year))
     22         df['BIRTH_DATE'] = df['BIRTH_DATE'].apply(lambda x: pd.to_datetime(x))
---> 23         df['NATIONALITY'] = df['NATIONALITY'].apply(lambda x: x.upper())
     24     return df
     25 

AttributeError: 'float' object has no attribute 'upper'

If you look at the page for FTW, 1956, you'll see there's no nationality listed for Chuck Noble. The resulting value is then NaN, which doesn't have an upper function.

Solution

Using the built-in pd.DataFrame.str method should address this issue:

df['NATIONALITY'] = df['NATIONALITY'].str.upper()

get_roster crashes when NaN name appears in the team's roster table

Miami Heat 2021 roster is given here. The table contains an empty row which causes the get_roster to crash because remove_accents(name, team, season_end_year) finds a NaN instead of a string. One solution would be to remove the nan Players in the get_roster one line above, e.g.

        df = df[df['PLAYER'].notna()]
        df['PLAYER'] = df['PLAYER'].apply(lambda name: remove_accents(name, team, season_end_year))

it should also be combined with another update on utils.remove_accents, for example line 54 should be

matches = sum(l1 == l2 for l1, l2 in zip(p, name)) if pd.notna(p) else 0

by the way, what is the utility of remove_accents() function?

image

get_schedule() returns a ValueError: Length mismatch

Running

schedule = get_schedule(season, playoffs=True)

Gives a ValueError: Length mismatch: Expected axis has 6 elements, new values have 5 elements due to the following line in seasons.py (line 28):

df.columns = ['DATE', 'VISITOR', 'VISITOR_PTS', 'HOME', 'HOME_PTS']

It looks like the dataframe now includes an "ARENA" column, so the corrected line should be

df.columns = ['DATE', 'VISITOR', 'VISITOR_PTS', 'HOME', 'HOME_PTS', `ARENA`]

EDIT: I posted a PR to address this here.

get_all_star_box_score(2012) returns error

Attempting to get the box score for the 2012 all-star game returns an error about out of range indexing for a player's last name

Code to reproduce:

from basketball_reference_scraper.box_score import get_all_star_box_score

print(get_all_star_box_score(2012))
Traceback (most recent call last):
  File "/home/jd/git/basketball_reference_scraper/test/test_box_scores.py", line 25, in test_get_all_star_box_score
    d = get_all_star_box_score(2012)
  File "/home/jd/git/basketball_reference_scraper/basketball_reference_scraper/box_scores.py", line 88, in get_all_star_box_score
    stats_df = get_stats(dnp, ask_matches=False)
  File "/home/jd/git/basketball_reference_scraper/basketball_reference_scraper/players.py", line 14, in get_stats
    suffix = get_player_suffix(name)
  File "/home/jd/git/basketball_reference_scraper/basketball_reference_scraper/utils.py", line 90, in get_player_suffix
    initial = last_name_part[0].lower()
IndexError: string index out of range

get game logs not working as expected

A normal call like get_game_logs('Thabo Sefolosha', '2013-08-01', '2014-02-02') will return the correct game logs, whereas for some players eg. get_game_logs('DeMarcus Cousins', '2013-08-01', '2014-02-02') does not return anything. Also happens for random other players (Patty Mills, Gerald Henderson, etc.). May be due to the bball-ref widget function

can't get data on non Latin names like Nikola Jokić

code works perfectly fine on Latin names, but when I try to get names with ć, the code fails
I tried to get the data on "Nikola Jokic" and after that returned empty, I tried to get the data with the name saved at the same data base. than the code failed.
code fails in line 29 in the file utils.py
this is my code:

d = get_box_scores('2020-01-06', 'DEN', 'ATL')
niko = d['DEN']['PLAYER'][0]
temp = get_game_logs(niko, "2019-10-10", "2020-10-10", playoffs=False)

How does get_stats handle players with uncommon names?

When I try getting Dario Šarić's stats, I get the following error:

Traceback (most recent call last):
File "C:\Users\Alexis\Downloads\winter2020\eecs497\algo.py", line 37, in
print(get_stats("Dario Šarić"))
File "C:\Program Files\Python38\Lib\site-packages\basketball_reference_scraper\players.py", line 11, in get_stats
suffix = get_player_suffix(name).replace('/', '%2F')
AttributeError: 'NoneType' object has no attribute 'replace'

What format do names have to be in?

2020 Schedule

Previous 2020 schedule does not seem to be working. Keep up the great work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.