Code Monkey home page Code Monkey logo

procyclingstats's People

Contributors

baronet2 avatar bmserras avatar cheeseycube avatar leapingllamas avatar lytixdev avatar selektormode avatar themm1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

procyclingstats's Issues

climbs() method for Stage scraper produces empty list

climbs_html = self.html.css_first("div > ul.list.circle")

Was following along with the examples/climbs_by_stages.py example, but all values in the stages_climbs dict were empty lists. Found that it was the climbs() method for Stage scraper not grabbing them from their respective pages. I resolved this locally by changing the CSS selector:

climbs_html = self.html.css_first("ul.list.circle") 

Worth noting that I installed procyclingstats from pip, and when doing so the version of selectolax that was installed in my virtual environment was 0.3.12, which is different than the version listed in requirements.txt (0.3.8). Not sure if related.

Results of one day race

Dear themm1, thank you for all the great work you already did on this Procyclingstats scraper.

I was wondering if it is possible to retrieve the results of a certain one day race? In my specific case, I am only interested in obtaining the winner.

I already tried multiple things, such as scraping the url (e.g. race/paris-roubaix/2024/result), or try to execute Race.stage_winners(). Unfortunately none of these did work.

Do you perhaps know of any method to retrieve the winner of a one day race?

Best regards.

Race results

Is it possible to add the results of a race? So parsing the table that you see in for example /race/gp-samyn/2024/result
image

One day races ranking scraping

Hello,

Your work is fantastic.
Unfortunately, the one day race ranking is identified as a race ranking and the parsing doesn't work.
Can you adapt the code for the one day racing?

I have the issue with the default:
https://www.procyclingstats.com/rankings/me/uci-one-day-races

or with a filtered request :
https://www.procyclingstats.com/rankings.php?date=2022-12-31&nation=&age=&zage=&page=smallerorequal&team=&offset=0&filter=Filter&p=me&s=one-day-races

here is the error:

Traceback (most recent call last):
File "", line 1, in
File "/Users/pavz/Library/Python/3.9/lib/python/site-packages/procyclingstats/scraper.py", line 112, in parse
parsed_data[method_name] = method()
File "/Users/pavz/Library/Python/3.9/lib/python/site-packages/procyclingstats/ranking_scraper.py", line 212, in races_ranking
table_parser.parse(fields)
File "/Users/pavz/Library/Python/3.9/lib/python/site-packages/procyclingstats/table_parser.py", line 102, in parse
raise UnexpectedParsingError(message)
procyclingstats.errors.UnexpectedParsingError: Field 'stage_name' wasn't parsed correctly

Thank you

Error with Stage when no profile score

pcs.Stage('https://www.procyclingstats.com/race/tour-du-gevaudan-languedoc-roussillon/2015/stage-2').parse()

Produces the error

IndexError: list index out of range

Invalid URL

When I run even basic commands, like:

from procyclingstats import RiderResults

results = RiderResults("rider/tadej-pogacar")
print(results.results())

I get the following error:

ValueError                                Traceback (most recent call last)
Input In [1], in <cell line: 3>()
      1 from procyclingstats import RiderResults
----> 3 results = RiderResults("rider/tadej-pogacar")
      4 print(results.results())

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/procyclingstats/scraper.py:50, in Scraper.__init__(self, url, html, update_html)
     48 self.update_html()
     49 if not self._html_valid():
---> 50     raise ValueError(
     51         f"HTML from given URL is invalid: '{self.url}'")
     52 self._set_up_html()

ValueError: HTML from given URL is invalid: 'https://www.procyclingstats.com/rider/tadej-pogacar'

The URL seems correct, so what is the problem?

Keyerror for getattr(stage, classification)

line data = getattr(stage, classification)() gives error in the pcs code it seems:

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users%username%\AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\stage_scraper.py", line 298, in results
table = join_tables(table, table_parser.table, "rider_url")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users%username%\AppData\Local\Programs\Python\Python311\Lib\site-packages\procyclingstats\utils.py", line 162, in join_tables
table.append({**table2_dict[row[join_key]], **row})
~~~~~~~~~~~^^^^^^^^^^^^^^^
KeyError: 'rider/laurens-de-plus'

race.stages() throws UnexpectedParsingError

Trying to parse stages from a specific Race, with the code provided by your repo in the file examples\climbs_by_stages.py

RACE_URL = "race/tour-de-france/2022"
race = Race(f"{RACE_URL}/overview")
race_climbs = RaceClimbs(f"{RACE_URL}/route/climbs")
stages = race.stages()

Throws exception UnexpectedParsingError with error

procyclingstats.errors.UnexpectedParsingError: Field 'profile_icon' wasn't parsed correctly

I've gave a watch at the error and it seems that your code can't parse the stages table due to one extra row 'sum' that calculates the sum of all kms stage. I suggest the removal of this row before the TableParsing.

Feature Request to add "age" method to Rider class

It would be nice to scrape the age from a given rider, rather than having to calculate it after the fact from their birthdate. To that end I have created a pull request with the addition of an age() method in the Rider class.

startlist() fails

Thanks for all the great work on this!
Happy to help with some MRs when I get time this weekend if you're open to it

Currently the startlist() parse appears broken. When I run the example code
race_startlist = RaceStartlist("race/tour-de-france/2022/startlist") race_startlist.startlist()

I get the following error:
`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[4], line 2
1 race_startlist = RaceStartlist("race/tour-de-france/2022/startlist")
----> 2 race_startlist.startlist()

File ~/racestats/.venv/lib/python3.8/site-packages/procyclingstats/race_startlist_scraper.py:96, in RaceStartlist.startlist(self, *args)
93 startlist_html = self.html.css_first(".startlist_v3")
94 # startlist is individual startlist e.g.
95 # race/tour-de-pologne/2009/gc/startlist
---> 96 if startlist_html.css_first("li.team") is None:
97 startlist_html = self.html.css_first(".page-content > div")
98 startlist_table = []

AttributeError: 'NoneType' object has no attribute 'css_first'`

stage results Table Parser fails on results with empty teams or ages

The Stage Results parser fails when some rows have empty values for age or team.

In some races riders do not have a team, or PCS does not know all ages. In that case a blank value is displayed.

For example: https://www.procyclingstats.com/race/nc-germany-we/2023/result

For the age it fails because an empty string can't be cast to an int. (table_parser.py:212)
This can be fixed by not casting the age to int, but leave it as an string. I don't know if this safe

For the teams it fails because the teams are found using _filter_a_elements. But an empty cell has no a_element and thus raises an UnexpectedParsingError.

Img

Could it be possible to also scrape the images of riders?

Thanks

Problem scraping when trying to run the method team.riders()

when running version procyclingstats==0.1.7 this code:

from procyclingstats import Team


team = Team("team/bora-hansgrohe-2022")
print(team.riders())

results in this error:

Traceback (most recent call last):
  File "C:\Users\user\Documenten\test.py", line 12, in <module>
    print(team.riders())
  File "C:\Users\user\anaconda3\lib\site-packages\procyclingstats\team_scraper.py", line 183, in riders
    table_parser = TableParser(career_points_table_html)
  File "C:\Users\user\anaconda3\lib\site-packages\procyclingstats\table_parser.py", line 31, in __init__
    table_body = html_table.css_first("tbody")
AttributeError: 'NoneType' object has no attribute 'css_first'

Question about your api

Hi when I use your api how does it work exactly? Does it work like this:
I use this in my python application:

from procyclingstats import Rider
rider = Rider("rider/tadej-pogacar")
rider.birthdate()
"1998-9-21"

When does the webscraping happen? Is it scraping locally from my device or am I making calls to a db/server that you created using the scraping? When is the data updated?

Thanks for your answer

Problem scraping if rider DNF first stage

$ python test.py
[
Traceback (most recent call last):
File "/Users/colin/marketcetera/workspaces/procyclingstats/code/procyclingstats/examples/test.py", line 27, in
pprint(stage.parse())
File "/Users/colin/Library/Python/3.10/lib/python/site-packages/procyclingstats/scraper.py", line 112, in parse
parsed_data[method_name] = method()
File "/Users/colin/Library/Python/3.10/lib/python/site-packages/procyclingstats/stage_scraper.py", line 298, in results
table = join_tables(table, table_parser.table, "rider_url")
File "/Users/colin/Library/Python/3.10/lib/python/site-packages/procyclingstats/utils.py", line 162, in join_tables
table.append({**table2_dict[row[join_key]], **row})
KeyError: 'rider/laurens-de-plus'

In the first stage of the '23 Vuelta, Laurens de Plus crashed and DNF. Not sure if this is the source of the issue or not. Sample script attached to reproduce.

test.py.txt

Max number of races is limited to 100?

Hi!

I have created a small custom function

def fetch_rider_results(rider_url):
    rider_results = RiderResults(rider_url + "/results")
    rider_results_JSON = rider_results.parse()
    return rider_results_JSON

When I call this function, for instance

st.write(fetch_rider_results("rider/jonas-vingegaard"))

I get a JSON object, but the number of races and stages only go from 0-99. Wouldn't it be possible instead to fetch for instance the results for the last 5 years?

Best regards
Kasper

team.wins_count() throws a ValueError when the team has no wins

The wins_count() function located in team_scraper.py assumes that the retrieved html text will be a valid integer, but throws an error when there are no wins because procyclingstats.com displays that with a dash instead of a zero. You can see what I mean below.

return int(wins_count_html.text())

I will be submitting a pull request with a simple fix if you would like to merge it.

error with RaceStartlist

Getting this error when I use RaceStartlist.

` 94 for team_html in startlist_html.css(".ridersCont"):
95 riders_table = team_html.css_first("ul")
---> 96 table_parser = TableParser(riders_table)
97 rider_f_to_parse = [f for f in casual_rider_fields if f in fields]
98 table_parser.parse(rider_f_to_parse)

AttributeError: 'NoneType' object has no attribute 'css_first'`

I see this has come up before, but I am using 0.1.6.

Thanks

rider_number in Stage.results, points, kom, gc, youth

Hi @themm1 , after many years of non coding I am feeling pretty rookie in python, but I have succeeded in writing some code to scrape results (all stage results, for an xlsx driven cycling game I alway play with some 100 friends. Extremely useful to be able to load all results.

Hoewever, somewhere after the recent giro - all scraping working perfectly during the giro - it appears that the rider_number column has disappeared from the Stage.results, points, kom, gc and youth. Also, startlist seems empty (from the API documentation I read that rider_number should only to be available in Racestartlist.startlist, but is was in results as well. In the html code the BIBs seem to be available in the results pages anyway.

As a BIB number is much easier and more consistent to match, my excel is driven on the BIB number, not so much the rider_name. A workaround to find the rider_number in Racestartlist.startlist() and put it in the dataframe I need for the results is possible, but not desireable if the data is also available in the results. Can you explain this issue?
Hoewever, in the html code the BIBs seem to be available in the results pages.

many thx!!

race.stages() fails

Seems like the site has changed. I get the following when using race.stages()

AttributeError                            Traceback (most recent call last)

----> 8         race.stages()

in stages(self, *args)

----> 172         if self.is_one_day_race():
    173             return []

in is_one_day_race(self)

     73         one_day_race_html = self.html.css_first("div.sub > span.blue")
---> 74         return "stage" not in one_day_race_html.text().lower()
     75 

AttributeError: 'NoneType' object has no attribute 'text'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.