Code Monkey home page Code Monkey logo

ipl_prediction's Introduction

CA4021 Final Year Project

Pre-Game and In-Game Outcome Prediction in Cricket

This project predicts the outcome of a T20 cricket match using a selection of features. The project involves analysis of the Indian Premier League (IPL) from 2008-2020 and uses a readily available dataset from Kaggle to build a predictive model. The effects of the toss, venue, city, and team on the outcome within a match are investigated alonside in-game statistics. The dataset can be found here. A Big Bash league dataset is also used for model validation to measure how robust and transferable the model is on data from different leagues and countries. This dataset can be found here.

Table of Contents

General Information

The objective of this project is to predict the outcome of cricket games' based on a selection of features. The data used in the predictive model is from T20 cricket matches and is from matches within the IPL from the years 2008-2020. This dataset is comprised of a matches.csv and ball_by_ball.csv. The aim is to gain a greater understanding of what features are most influential in this prediction task. The features used include city, venue, country, toss outcome and teams playing alongside in game scores. The project involves exploring the dataset to identify which aspects of the data carry the strongest weight in prediction and comparing this to domain knowledge within professional cricket as to what is expected. The various periods within the game are explored to see if there are any trends with respect to what sections of the game have the largest impact on the result.

There are four components to the project:

Data Analysis

Both csv files within the IPL data are inspected to understand what features may be of use in the predictive analysis. The toss, home advantage, and choosing to bat first or field first are further investigated.

Data Transformation

The ball by ball dataset is manipulated to created calculated fields for the in-game scores.

Pre-Game Prediction

Predictive analysis is run on on the matches dataset using a variety of pre-game features across a selection of algorithms.

In-Game Prediction

In-game statistics are used from the transformed dataset and predictive analysis is run on this data. The BBL data is also used to evaluate the performance of the model and a further evaluation of the model using live in-game betting odds is completed.

Technology Used

Analysis is completed in python using jupyter notebook. Scikit-learn is used for the predictive models as it has a selection of algorithms appropriate for this task. The selected models are:

  • Random Forest
  • Support Vector Machine (SVM)
  • Logistic regression
  • Naïve Bayes
  • XGBoost

Project Status

Project is: completed

Results

Model All data columns with Feature Selection with One hot encoding of teams with One hot encoding of home team
Random Forest 0.66 0.57 0.65 0.67
Logistic Regression 0.47 0.56 0.56 0.55
Naïve Bayes 0.58 0.60 0.57 0.55
Support Vector Machine 0.56 0.55 0.56 0.55
XGBoost 0.52 0.57 0.57 0.55

Room for Improvement

For future work it would be interesting to add some player features and prior data such as team form to the historic data to see if this improves the pre match prediction. Further drilling down into countries, time of day, and teams would be interesting to see if there could be any improvement on the pre game predictive accuracy.

ipl_prediction's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.