Code Monkey home page Code Monkey logo

ml_decision_tree's Introduction

HW 1: Decision Trees and Foundations

This assignment is due October 13 at 11:59pm CDT. There are two points of extra credit for passing the test_setup test case, due early on October 6 at 11:59pm CDT. Late work will not be accepted except in extreme circumstances.

Academic integrity

Your work must be your own. Do not submit other people's work as your own, and do not allow others to submit your work as theirs. You may talk with other students about the homework, but you may not share code with them in any way. If you have a question about an error message or about why a numpy function returns what it does, post it on Piazza. If you need help debugging your code, make a private post on Piazza or come to office hours. If you're unsure of the academic integrity policies, ask for help; we can help you avoid breaking the rules, but we can't un-report a suspected violation.

By pushing your code to GitHub, you agree to these rules, and understand that there may be severe consequences for violating them.

Important instructions

Your work will be graded and aggregated using an autograder that will download the code and free response questions from each student's repository. If you don't follow the instructions, you run the risk of getting zero points. The test_setup test case gives you extra credit for following these instructions and will make it possible to grade your work easily.

The essential instructions:

  • Your code and written answers must be pushed to GitHub for us to grade them! We will only grade the latest version of your code that was pushed to GitHub before the deadline.
  • Your NetID must be in the netid file; replace NETID_GOES_HERE with your netid.
  • Your answer to each free response question should be in its own PDF with the filename XXX_qYYY.pdf, where XXX is your NetID and YYY is the question number. So if your NetID is xyz0123, your answer to free response question 2 should be in a PDF file with the filename xyz0123_q2.pdf.
  • Please do not put your name in your free response PDFs -- we will grade these anonymously.

Clone this repository

First, you need to clone this repository. If you haven't used git before, check out this helpful guide.

Once you have git installed on your computer, you'll need the link to this repository (find it on the github.com under "Clone or Download"), which might look something like [email protected]:cs349/hw1-knn-decision-tree.git. Then run:

git clone YOUR-LINK

As soon as you've downloaded it, go ahead and add your NetID to the netid file, run git add netid, then git commit -m "added netid", and git push origin main. If you've successfully run those commands, you're almost done with the test_setup test case.

Environment setup

This course uses Python 3. Python 2 will not work for these assignments and all assignments will be graded with Python 3 on our end.

If you used Python before, you've probably installed packages (e.g. numpy). To avoid having to uninstall old packages in order to reinstall the correct ones, you are strongly encouraged to create a "virtual environment" using miniconda. Virtual environments are a simple way to isolate all the dependencies for a particular project, making it easy to work on multiple projects at once without them interfering with each other (e.g. conflicting versions of libraries between projects). To make sure your environment matches the testing environment that we use for grading exactly, it's best to make a new environment for each assignment in this course.

Install the latest version of miniconda for your operating system. After installing you should be able to run conda from your terminal. If you can't, you may need to run source ~/.bash_profile or restart your terminal. If you for some reason cannot use miniconda, try python's venv module. Once you have conda set up, let's create a virtual environment by running:

conda create -n cs349hw1 python

Once it's created, you can activate it with:

conda activate cs349hw1

Here, cs349hw1 is the name for the environment. If you name it something else and forget what you named it, you can call conda env list to list all your conda environments. After activating it, you'll likely see that your terminal prompt has changed to include (cs349hw1). Now, you can install the packages necessary for this homework by going to the root directory of this repository and running:

pip install -r requirements.txt

Once these install, you're all set up to do the homework! You'll need to do this for each homework in this class. If you want to deactivate your environment, you can simply call conda deactivate.

What to do for this assignment

The detailed instructions for the work you need to do are in problems.md.

For the coding portion of the assignment, you will:

  • Solve some numpy practice problems
  • Implement a train test split and cross validation
  • Implement some classification metrics
  • Implement a simple model that just predicts the mode (most common class)
  • Compute information gain
  • Implement a decision tree with the ID3 algorithm

You will also write up answers to a few free response questions.

In every function where you need to write code, there is a raise NotImplementeError in the code. The test cases will guide you through the work you need to do and tell you how many points you've earned. The test cases can be run from the root directory of this repository with:

python -m pytest -s

To run a single test, you can specify it with -k, e.g., python -m pytest -s -k test_setup. To run a group of tests, you can use -k with a prefix, e.g., python -m pytest -s -k test_decision will run all tests that begin with test_decision. The -s means that any print statements you include will in fact be printed; the default behavior (python -m pytest) will suppress everything but the pytest output.

We will use these test cases to grade your work! Even if you change the test cases such that you pass the tests on your computer, we're still going to use the original test cases to grade your assignment.

Questions? Problems? Issues?

Ask a question on Piazza, and we'll help you there.

Helpful Material

Let’s Write a Decision Tree Classifier from Scratch - Machine Learning Recipes #8

Decision Tree Lecture Series

  1. How it works
  2. ID3 Algorithm
  3. Which attribute to split on
  4. Information Gain and Entropy

ID3-Algorithm: Explanation

Entropy

What is entropy in Data Science (very nice explanaton) Entropy as concept in Physics/Chemistry (only if you're interested)

Recursion

Python: Recursion Explained Recursion example

ml_decision_tree's People

Contributors

marnonel6 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.