Code Monkey home page Code Monkey logo

fifa-stats-crawler's Introduction

Python supported versions GPLv3 license PRs Welcome

Football Players Statistics WebCrawler

This project is a sub-module for Multiplayer Football Draft Simulator.

About

A web-crawler to scrape all football players' information from Sofifa and exporting it to JSON format. Perform data cleaning and analytics on the obtained data

  • Crawler: Built on scrapy using python3
  • Analytics: IPynb noteboook python3

Further exported to the Football Draft Backend to serve from an endpoint

Steps to run the project

Easy Run

chmod +x ./run.sh
./run.sh

Manual Setup and Run

  • Setup virtualenv (optional, but recommended)

    virtualenv -p python3.8 env
    source env/bin/activate
    
  • Install project dependencies

    pip install -r requirements.txt
  • Run the crawler with ./fifa-crawler as current directory (This the main scrapy crawler directory)

    cd fifa_crawler
    
  • First run the URL spider (To get all players urls)

    scrapy crawl players_urls
  • After successfull, run the stats spider (To get the players statistics from URLs from above)

    scrapy crawl players_stats

Scope/Aim as an indiviual project

Future features

  • Add analysis projects on the crawled data.
  • Update the crawler to perform scraping to obtain Teams data (currently player-data)
  • Improve speed of the crawler

Metadata

Click here to expand meta view, or go-here for a detailed view
id
  • type: string

  • example: "158023"

name
  • type: string

  • example: "Lionel Andrés Messi Cuccittini"

short_name
  • type: string

  • example: "L. Messi"

photo_url
primary_position
  • type: string

  • example: "RW"

positions
  • type: string[]

  • example: ["RW", "ST", "CF"]

age
  • type: string

  • example: "33"

birth_date
  • type: string (DateFormat is YYYY/MONTH_NAME_SHORT/DD)

  • example: "1987/Jun/24"

height
  • type: integer (in cms)

  • example: 170

weight
  • type: integer (in kg)

  • example: 72

Overall Rating
  • type: integer

  • example: 93

Potential
  • type: integer

  • example: 93

Value
  • type: string (in euros)

  • example: "€103.5M"

Wage
  • type: string (in euros)

  • example: "€560K"

Preferred Foot
  • type: enum["Left", "Right"]

  • example: "Left"

Weak Foot
  • type: integer (range 1-5)

  • example: 4

Skill Moves
  • type: integer (range 1-5)

  • example: 4

International Reputation
  • type: integer (range 0-5)

  • example: 5

Work Rate
  • type: enum["Medium/Low"]

  • example: "Medium/Low"

Body Type
  • type: enum["Unique", "Normal (170-185)", "Normal (185+)", "Lean (170-185)", "Lean (185+)", "Stocky (170-185)", "Normal (170-)", "Stocky (185+)", "Stocky (185+)", "Stocky (170-)", ]

  • example: "Unique"

Real Face
  • type: enum["Yes", "No"]

  • example: "Yes"

Release Clause
  • type: string (in euros)

  • example: "€212.2M"

teams
  • type: map<string, integer> (including international and domestic clubs)

  • example:

{
"FC Barcelona": 84,
"Argentina": 83
}
attacking
  • type: map<attackOptions, integer>
attackOptions
  • type: enum["Crossing", "Finishing", "HeadingAccuracy", "ShortPassing", "Volleys"]
  • example:
{
    "Crossing": 85,
    "Finishing": 95,
    "HeadingAccuracy": 70,
    "ShortPassing": 91,
    "Volleys": 88
}
skill
  • type: map<skillOptions, integer>
skillOptions
  • type: enum["Dribbling", "Curve", "FKAccuracy", "LongPassing", "BallControl"]
  • example:
{
    "Dribbling": 96,
    "Curve": 93,
    "FKAccuracy": 94,
    "LongPassing": 91,
    "BallControl": 96
}
movement
  • type: map<movementOptions, integer>
movementOptions
  • type: enum["Acceleration", "SprintSpeed", "Agility", "Reactions", "Balance"]
  • example:
{
    "Acceleration": 91,
    "SprintSpeed": 80,
    "Agility": 91,
    "Reactions": 94,
    "Balance": 95
}
power
  • type: map<powerOptions, integer>
powerOptions
  • type: enum["ShotPower", "Jumping", "Stamina", "Strength", "LongShots"]
  • example:
{
    "ShotPower": 86,
    "Jumping": 68,
    "Stamina": 72,
    "Strength": 69,
    "LongShots": 94
}
mentality
  • type: map<mentalityOptions, integer>
mentalityOptions
  • type: enum["Aggression", "Interceptions", "Positioning", "Vision", "Penalties", "Composure"]
  • example:
{
    "Aggression": 44,
    "Interceptions": 40,
    "Positioning": 93,
    "Vision": 95,
    "Penalties": 75,
    "Composure": 96
}
defending
  • type: map<defendingOptions, integer>
defendingOptions
  • type: enum["DefensiveAwareness", "StandingTackle", "SlidingTackle"]
  • example:
{
    "DefensiveAwareness": 32,
    "StandingTackle": 35,
    "SlidingTackle": 24
}
goalkeeping
  • type: map<goalkeepingOptions, integer>
goalkeepingOptions
  • type: enum["GKDiving", "GKHandling", "GKKicking", "GKPositioning", "GKReflexes"]
  • example:
{
    "GKDiving": 6,
    "GKHandling": 11,
    "GKKicking": 15,
    "GKPositioning": 14,
    "GKReflexes": 8
}
player_traits
  • type: string["Technical Dribbler (AI)","Long Shot Taker (AI)","Flair","Speed Dribbler (AI)","Injury Prone","Long Passer (AI)","Playmaker (AI)","Power Header","Dives Into Tackles (AI)","Outside Foot Shot","Team Player","Finesse Shot","Leadership","Solid Player","Early Crosser","Long Throw-in","Comes For Crosses","Power Free-Kick","GK Long Throw","Cautious With Crosses","Rushes Out Of Goal","Saves with Feet","Chip Shot (AI)","Giant Throw-in","One Club Player"]

  • example:

[
    "Finesse Shot",
    "Long Shot Taker (AI)",
    "Speed Dribbler (AI)",
    "Playmaker (AI)",
    "Outside Foot Shot",
    "One Club Player",
    "Team Player",
    "Chip Shot (AI)"
]
player_hashtags
  • type: string["#Strength","#Acrobat","#Engine","#Speedster","#Dribbler","#Aerial Threat","#Tactician","#FK Specialist","#Crosser","#Distance Shooter","#Clinical Finisher","#Playmaker","#Tackling","#Complete Midfielder","#Complete Forward","#Poacher","#Complete Defender"] (Each tag starts with #)

example:

[
    "#Dribbler",
    "#Distance Shooter",
    "#FK Specialist",
    "#Acrobat",
    "#Clinical Finisher",
    "#Complete Forward"
]
logos
  • type: map<groupNames, logoAttributes>
groupNames
  • type: enum["country", "club", "nationalClub"]
logoAttributes
  • type: map<enum["name", "url"], string>

  • logoAttributes examples:

{
    "name": "Argentina",
    "url": "https://cdn.sofifa.com/flags/ar.png"
}
  • examples:
{
    "country": {
    "name": "Argentina",
    "url": "https://cdn.sofifa.com/flags/ar.png"
    },
    "club": {
    "name": "FC Barcelona",
    "url": "https://cdn.sofifa.com/teams/241/60.png"
    },
    "nationalClub": {
    "name": "Argentina",
    "url": "https://cdn.sofifa.com/teams/1369/60.png"
    }
}

Contributing tot the Project

We love your input! We want to make contributing to this project as easy and transparent as possible, whether it's:

  • Reporting a bug
  • Discussing the current state of the code
  • Submitting a fix
  • Proposing new features

Making a PR

  • Fork the repo and clone it on your machine.

  • Add a upstream link to main branch in your cloned repo

     git remote add https://github.com/sauravhiremath/fifa-stats-crawler.git
    
    
  • Keep your cloned repo upto date by pulling from upstream (this will also avoid any merge conflicts while committing new changes)

    git pull upstream master
    
  • Create your feature branch

    git checkout -b <feature-name>
    
  • Commit all the changes

    git commit -am "Meaningful commit message"
    
  • Push the changes for review

    git push origin <branch-name>
    
  • Create a PR from our repo on Github.

Additional Notes

  • Code should be properly commented to ensure it's readability.
  • If you've added code that should be tested, add tests as comments.
  • In python use docstrings to provide tests.
  • Make sure your code properly formatted.
  • Issue that pull request!

Issue suggestions/Bug reporting

When you are creating an issue, make sure it's not already present. Furthermore, provide a proper description of the changes. If you are suggesting any code improvements, provide through details about the improvements.

Great Issue suggestions tend to have:

  • A quick summary of the changes.
  • In case of any bug provide steps to reproduce
    • Be specific!
    • Give sample code if you can.
    • What you expected would happen
    • What actually happens
    • Notes (possibly including why you think this might be happening, or stuff you tried that didn't work)

Additional References:

More step by step guide with pictures for creating a pull request can be found here

fifa-stats-crawler's People

Contributors

akram9 avatar sauravhiremath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fifa-stats-crawler's Issues

Crawler Update

New data to be collected

  • Player Id
  • Player Short names
  • Most played player position
  • Club and country logo URLs

Old Data to be fixed

  • TBD

players_stats does not start

Scrape seems to start but when I proceed to use the "scrapy crawl players_stats" command it gives me the following error:

`2023-06-19 17:39:26 [scrapy.utils.log] INFO: Scrapy 2.9.0 started (bot: fifa_parser)
2023-06-19 17:39:26 [scrapy.utils.log] INFO: Versions: lxml 4.8.0.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.21.0, Twisted 22.2.0, Python 3.9.12 (main, Apr 5 2022, 01:53:17) - [Clang 12.0.0 ], pyOpenSSL 21.0.0 (OpenSSL 1.1.1n 15 Mar 2022), cryptography 3.4.8, Platform macOS-10.16-x86_64-i386-64bit
2023-06-19 17:39:26 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'fifa_parser',
'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
'NEWSPIDER_MODULE': 'fifa_parser.spiders',
'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES': ['fifa_parser.spiders'],
'USER_AGENT': 'sofifa (+http://www.yourdomain.com)'}
2023-06-19 17:39:26 [py.warnings] WARNING: /Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/utils/request.py:232: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.

It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.

See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation.
return cls(crawler)

2023-06-19 17:39:26 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2023-06-19 17:39:26 [scrapy.extensions.telnet] INFO: Telnet Password: 37f25bbb0301924b
2023-06-19 17:39:26 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2023-06-19 17:39:26 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2023-06-19 17:39:26 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-06-19 17:39:26 [scrapy.middleware] INFO: Enabled item pipelines:
['fifa_parser.pipelines.JsonPipeline']
2023-06-19 17:39:26 [scrapy.core.engine] INFO: Spider opened
2023-06-19 17:39:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-06-19 17:39:26 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-06-19 17:39:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/robots.txt> (referer: None)
2023-06-19 17:39:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://sofifa.com/player/158023?units=mks> (referer: None)
2023-06-19 17:39:27 [scrapy.core.scraper] ERROR: Spider error processing <GET https://sofifa.com/player/158023?units=mks> (referer: None)
Traceback (most recent call last):
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/utils/defer.py", line 260, in iter_errback
yield next(it)
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/utils/python.py", line 336, in next
return next(self.data)
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/utils/python.py", line 336, in next
return next(self.data)
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/spidermiddlewares/offsite.py", line 28, in
return (r for r in result or () if self._filter(r, spider))
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/spidermiddlewares/referer.py", line 352, in
return (self._set_referer(r, response) for r in result or ())
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/spidermiddlewares/urllength.py", line 27, in
return (r for r in result or () if self._filter(r, spider))
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/spidermiddlewares/depth.py", line 31, in
return (r for r in result or () if self._filter(r, response, spider))
File "/Users/alessandroagostinelli/opt/anaconda3/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 106, in process_sync
for r in iterable:
File "/Users/alessandroagostinelli/Documents/fifa-stats-crawler-master/fifa-crawler/fifa_parser/spiders/players_stats.py", line 89, in parse
age, month, day, year, height, weight = player_info[0].split()
IndexError: list index out of range
2023-06-19 17:39:27 [scrapy.core.engine] INFO: Closing spider (finished)
2023-06-19 17:39:27 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 463,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 15655,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'elapsed_time_seconds': 0.428898,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2023, 6, 19, 15, 39, 27, 373180),
'httpcompression/response_bytes': 84341,
'httpcompression/response_count': 2,
'log_count/DEBUG': 3,
'log_count/ERROR': 1,
'log_count/INFO': 10,
'log_count/WARNING': 1,
'memusage/max': 68419584,
'memusage/startup': 68419584,
'response_received_count': 2,
'robotstxt/request_count': 1,
'robotstxt/response_count': 1,
'robotstxt/response_status_count/200': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'spider_exceptions/IndexError': 1,
'start_time': datetime.datetime(2023, 6, 19, 15, 39, 26, 944282)}
2023-06-19 17:39:27 [scrapy.core.engine] INFO: Spider closed (finished)
(base) alessandroagostinelli@iMacdiAssandro2 fifa-crawler %
`

[Feature] Scraper for FIFA Teams

  • Current implementation scrapes player stats
  • Required Feature: Scrape data per team, divided in terms of
    country -> leagues -> teams

[Feature] Metadata Generator

  • Create new metadata on every update to the scraped data structure
  • Reason: Allows for easy preview of the data contents

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.