For this project, I have considered Soccer SPI dataset.
Dataset link: https://github.com/fivethirtyeight/data/tree/master/soccer-spi
Steps to execute:
- Download the files from the github repository.
- Get the soccer_spi.csv file by extracting from .rar file.
- Place the csv files in datasets folder and place the datasets folder in notebooks folder. The notebooks folder should also have ipynb file as well.
- Navigate to terminal and type "jupyter notebook"
- Navigate to the folder where the notebook is placed.
- From the menu icon cell, click on Run all which will run the whole notebook from the first cell. Verify the results.
The project is all about building regression models to determine the decision as yes/no or win/lose using the other columns as features.
Steps to follow:
- Set up a data science project structure in a new git repository in your GitHub account
- Pick one of the game data sets depending your sports preference https://github.com/fivethirtyeight/nfl-elo-game https://github.com/fivethirtyeight/data/tree/master/mlb-elo https://github.com/fivethirtyeight/data/tree/master/nba-carmelo https://github.com/fivethirtyeight/data/tree/master/soccer-spi
- Load the data set into panda data frames
- Formulate one or two ideas on how feature engineering would help the data set to establish additional value using exploratory data analysis
- Build one or more regression models to determine the scores for each team using the other columns as features
- Document your process and results
- Commit your notebook, source code, visualizations and other supporting files to the git repository in GitHub