HW 1: Decision Trees and Foundations

This assignment is due October 13 at 11:59pm CDT. There are two points of extra credit for passing the test_setup test case, due early on October 6 at 11:59pm CDT. Late work will not be accepted except in extreme circumstances.

Academic integrity

Your work must be your own. Do not submit other people's work as your own, and do not allow others to submit your work as theirs. You may talk with other students about the homework, but you may not share code with them in any way. If you have a question about an error message or about why a numpy function returns what it does, post it on Piazza. If you need help debugging your code, make a private post on Piazza or come to office hours. If you're unsure of the academic integrity policies, ask for help; we can help you avoid breaking the rules, but we can't un-report a suspected violation.

By pushing your code to GitHub, you agree to these rules, and understand that there may be severe consequences for violating them.

Important instructions

Your work will be graded and aggregated using an autograder that will download the code and free response questions from each student's repository. If you don't follow the instructions, you run the risk of getting zero points. The test_setup test case gives you extra credit for following these instructions and will make it possible to grade your work easily.

The essential instructions:

Your code and written answers must be pushed to GitHub for us to grade them! We will only grade the latest version of your code that was pushed to GitHub before the deadline.
Your NetID must be in the netid file; replace NETID_GOES_HERE with your netid.
Your answer to each free response question should be in its own PDF with the filename XXX_qYYY.pdf, where XXX is your NetID and YYY is the question number. So if your NetID is xyz0123, your answer to free response question 2 should be in a PDF file with the filename xyz0123_q2.pdf.
Please do not put your name in your free response PDFs -- we will grade these anonymously.

Clone this repository

First, you need to clone this repository. If you haven't used git before, check out this helpful guide.

Once you have git installed on your computer, you'll need the link to this repository (find it on the github.com under "Clone or Download"), which might look something like [email protected]:cs349/hw1-knn-decision-tree.git. Then run:

git clone YOUR-LINK

As soon as you've downloaded it, go ahead and add your NetID to the netid file, run git add netid, then git commit -m "added netid", and git push origin main. If you've successfully run those commands, you're almost done with the test_setup test case.

Environment setup

This course uses Python 3. Python 2 will not work for these assignments and all assignments will be graded with Python 3 on our end.

If you used Python before, you've probably installed packages (e.g. numpy). To avoid having to uninstall old packages in order to reinstall the correct ones, you are strongly encouraged to create a "virtual environment" using miniconda. Virtual environments are a simple way to isolate all the dependencies for a particular project, making it easy to work on multiple projects at once without them interfering with each other (e.g. conflicting versions of libraries between projects). To make sure your environment matches the testing environment that we use for grading exactly, it's best to make a new environment for each assignment in this course.

Install the latest version of miniconda for your operating system. After installing you should be able to run conda from your terminal. If you can't, you may need to run source ~/.bash_profile or restart your terminal. If you for some reason cannot use miniconda, try python's venv module. Once you have conda set up, let's create a virtual environment by running:

conda create -n cs349hw1 python

Once it's created, you can activate it with:

conda activate cs349hw1

Here, cs349hw1 is the name for the environment. If you name it something else and forget what you named it, you can call conda env list to list all your conda environments. After activating it, you'll likely see that your terminal prompt has changed to include (cs349hw1). Now, you can install the packages necessary for this homework by going to the root directory of this repository and running:

pip install -r requirements.txt

Once these install, you're all set up to do the homework! You'll need to do this for each homework in this class. If you want to deactivate your environment, you can simply call conda deactivate.

What to do for this assignment

The detailed instructions for the work you need to do are in problems.md.

For the coding portion of the assignment, you will:

Solve some numpy practice problems
Implement a train test split and cross validation
Implement some classification metrics
Implement a simple model that just predicts the mode (most common class)
Compute information gain
Implement a decision tree with the ID3 algorithm

You will also write up answers to a few free response questions.

In every function where you need to write code, there is a raise NotImplementeError in the code. The test cases will guide you through the work you need to do and tell you how many points you've earned. The test cases can be run from the root directory of this repository with:

python -m pytest -s

To run a single test, you can specify it with -k, e.g., python -m pytest -s -k test_setup. To run a group of tests, you can use -k with a prefix, e.g., python -m pytest -s -k test_decision will run all tests that begin with test_decision. The -s means that any print statements you include will in fact be printed; the default behavior (python -m pytest) will suppress everything but the pytest output.

We will use these test cases to grade your work! Even if you change the test cases such that you pass the tests on your computer, we're still going to use the original test cases to grade your assignment.

Questions? Problems? Issues?

Ask a question on Piazza, and we'll help you there.

Helpful Material

Let’s Write a Decision Tree Classifier from Scratch - Machine Learning Recipes #8

Decision Tree Lecture Series

ID3-Algorithm: Explanation

Entropy

What is entropy in Data Science (very nice explanaton) Entropy as concept in Physics/Chemistry (only if you're interested)

Recursion

Python: Recursion Explained Recursion example

marnonel6 / ml_decision_tree Goto Github PK

ml_decision_tree's Introduction

HW 1: Decision Trees and Foundations

Academic integrity

Important instructions

Clone this repository

Environment setup

What to do for this assignment

Questions? Problems? Issues?

Helpful Material

Entropy

Recursion

ml_decision_tree's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent