This package includes several of the functions I use to analyze football data. More functions will be added as I get everything situated.
Currently, I just have functions to generated style/profile similarities between players, teams, and leagues. As this is the first time I'm publishing a package, I needed to start small.
Similarity scores are purely based on the style, not quality of a player, league, or team. They are generated from over 35 variales (all z-scores so they are the same scale) and use Cosine Similarity. This is a pretty standard method, and StatsBomb uses Cosine Similarity as one method for their similar player search.
To install the package, you can type or copy/paste this into you command line:
pip install git+https://github.com/griffisben/griffis_soccer_analysis.git
If you want to update the package, you can first uninstall the old package by entering this line into your command line:
pip uninstall griffis_soccer_analysis
Here is an example of how to load the package.
from griffis_soccer_analysis.fbref_code import *
from griffis_soccer_analysis.similarity import *
from griffis_soccer_analysis.whoscored_match_report import *
Download a csv of events for a given match on the Detailed Tournaments section of Whoscored, and make a post-match report image
whoscored_match_report(
# The URL of the Whoscored match (detailed tournamnets only)
url = 'https://www.whoscored.com/Matches/1735525/Live/Belgium-Jupiler-Pro-League-2023-2024-Union-St-Gilloise-Eupen',
# The match name
match = 'Union St. Gilloise 4-1 Eupen',
# Home team name
team_h = 'Union St. Gilloise',
# Whoscored's ID for the home team (can find in the html slug of the team's page)
teamId_h = 2647,
# Away team name
team_a = 'Eupen',
# Away team ID
teamId_a = 2166,
# League name
lg = 'Belgian Pro League',
# Game date
date = 'Oct 20, 2023',
# Your signature
sig = '@BeGriffis',
# Language for the image (English, Spanish, Portuguese, French)
language = 'English',
# Location of your Chromedriver
chrome_driver_loc = r"C:\Users\Ben\chromedriver.exe",
# Where you want the data to download to
data_download_loc = "C:/users/ben/downloads",
# Where you want the final image to export to
img_save_loc = "C:/users/ben/downloads",
# Path to home team's image
home_img = "C:/users/ben/club images/Union St. Gilloise.png",
# Path to away team's image
away_img = "C:/users/ben/club images/Eupen.png"
)
Download the latest FBRef data (via Opta) for UEFA top 5 players (note, will only work for the current, ongoing season)
scrape_fbref_t5_leagues_players(season = '2023-2024')
Download the latest FBRef data (via Opta) for all other leagues FBRef has advanced data for (note, will only work for the current, ongoing season)
comps = ['2. Bundesliga',
'Ligue 2',
'Serie B',
'La Liga 2',
'Belgian Pro League',
'Argentine Primera División',
'Argentina Copa de la Liga'
'Liga MX',
'MLS',
'Brasileirão',
'Eredivisie',
'Primeira Liga',
'Championship',
]
ssns = ['2023-2024',
'2023-2024',
'2023-2024',
'2023-2024',
'2023-2024',
'2023',
'2023',
'2023-2024',
'2023',
'2023',
'2023-2024',
'2023-2024',
'2023-2024',
]
scrape_fbref_next12_leagues_players(comps = comps, seasons = ssns)
combine_t5_next12_fbref(t5_season = '2023-2024')
fbref_scout_report(season = '2023-2024',
program = 'mf', # gk, df, mf, fw (each has 12 selected metrics for the position)
player_pos = 'MF', # GK, DF, MF, FW (FBRef positions)
playerPrompt = 'Lucas Besozzi',
SquadPrompt = '',
minutesPlayed = 700,
compP = 'Argentine Primera División',
saveinput = 'n',
signature = '@BeGriffis',
data_date = 'Data as of 9/29/23',
fbref_file_path = 'C:/Users/Ben/From Mac/Python/'
)
For a Juypter Notebook file with example code & output and more info on each variable, please see this file
available_leagues()
teams_in_league(
league = "A-League Men 22-23"
)
available_players(
league = "A-League Men 22-23",
team = 'Western Sydney Wanderers'
)
# This is how to grab all outputs
df, info, dist_fig = league_similarity(
league = "A-League Men",
season = "22-23",
nlgs = 15
)
# This is one way to print the information and then the similarity dataframe
for i in range(len(info)):
print(info[i])
df
# This is how to grab all outputs
df, info, dist_fig = team_similarity(
team = "Western Sydney Wanderers",
league = "A-League Men",
season = "22-23",
nlgs = 15
)
# This is one way to print the information and then the similarity dataframe
for i in range(len(info)):
print(info[i])
df
"""
CHOICES FOR compare_league VARIABLE
All
UEFA T5
UEFA Next 10
UEFA Next 20
UEFA T5 2nd Tiers
8 AFC
CONMEBOL Top 4
Argentina & Brazil
Scandinavia (note: this only includes top tiers from Denmark, Norway, Sweden, and Finland)
Or, make your own by writing each league-season name (can find with available_leagues()) separated by a space & comma:
'MLS 2023, Liga MX 23-24'
'Premier League 23-24, Championship 23-24, League One 23-24, League Two 23-24'
"""
df, info, dist_fig = player_similarity(
player = "E. Spertsyan (23, Krasnodar, Russian Premier League 23-24)",
position = "CM",
nplayers = 20,
compare_leagues = 'All',
min_age = 16, # Minimum age of similar players (always still inside the top 5%)
max_age = 40, # Maximum age of similar players (always still inside the top 5%)
similar_lg_team = False,
mean_sim = False,
)
# This is one way to print the information and then the similarity dataframe
for i in range(len(info)):
print(info[i])
df