Code Monkey home page Code Monkey logo

modeling-march-madness's Introduction

Modeling March Madness in Python

A four-part series by Nick Vogt

Repo is updated through Workshop 1: Webscraping. The next workshop will be posted by February 20, 2017.

Join me and MSU Data Science in modeling march madness entirely in Python!

Learn how to scrape web pages to build your dataset and model the tournament in 4 parts:

  1. Web Scraping
  2. Data Mining and Feature Engineering
  3. Choosing a Model
  4. Model Evaluation

At the end of this project, you will have a fully function model of the NCAA Basketball Tournament. Tweak it to your own needs and desires to beat up on your friends and co-workers in a tourney pool.

Python Version and Libraries

These workshops use Python 3.5. I'm pretty certain any Python 3 version will suffice, but update your version if you have doubts.

We'll make heavy use of the requests and BeautifulSoup libraries. The former is a Python default, but the latter will need to be installed. If you don't know how, read the docs. Others have explained this better than I could, so if the docs leave you confused, a google search should be sufficient to help you install it.

Of course, we'll use other libraries, but I've specifically left these workshops to focus on Python built-ins.

Resources

There are three main resources:

  1. Jupyter Notebooks
  2. Powerpoint Presentations
  3. Youtube Videos (forthcoming)

Jupyter Notebooks and Powerpoint Presentations

There are powerpoint (.pptx) and jupyter notebook (.ipynb) files which complement each other. They are titled similarly, such that web-scraping.pptx corresponds to web-scraping.ipynb. Powerpoints can be found in the root directory for easy access. All notebooks and python files are found in the src folder.

Directory Map
    - modeling-march-madness -- root directory.  
    |-/data -- stores clean data files.  
    |-/data-raw -- stores raw data files.  
    |-/html -- stores raw htmls in .txt files.  
    |-/src -- stores all python source files.  

The src files wil create the remaining directories in the root directory when python are run.

Youtube Videos

The workshops will be posted on the MSU Data Science youtube channel and delivered at Michigan State Univeristy throughout the Spring.

If you live near East Lansing, MI and want to attend the workshops, keep an eye on the MSU Data Science website or facebook page for event updates. Attendance is free for everyone.

Python Files

All Python files can be found in the src subdirectory.

/src
    Scripts:
        Scraper.py -- Contains the main Scraper class.
        scrape-sports-reference.py -- Scrapes all cbb gamesheets and boxscores from sports-reference.com

    Notebooks:
        web-scraping.ipynb -- Corresponding notebook to the `web-scraping.pptx` file.

Legal Disclaimer

None of my communication should be understood as legal advice. I am not responsible for any liabilities incurred by users of this repo. It is your responsibility to ensure you are operating within your legal rights. As noted several times throughout the tutorials, scrape responsibly.

modeling-march-madness's People

Contributors

nicholas-vogt avatar

Watchers

James Cloos avatar Hongyang song avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.