Code Monkey home page Code Monkey logo

knearestneighbors's Introduction

kNearestNeighbors

This is first assignment of Introduction to Machine Learning (COMP 462) course. In this assignment, I implement k-NN classification algorithm from scratch and test it using the Iris dataset. The classifier has several main methods: fit, predict, accuracy and draw_decision_boundaries. Also it has two important input parameters:

  1. Number of Neighbors (k): This is the k value in the k-NN algorithm.
  2. Distance Metric: Distance metric to be used to compute distances between samples. Distance metric can be Euclidean distance, Manhattan distance, and Cosine Distance.

Dataset

Iris dataset contains three flowers: Iris Versicolor, Iris Setosa, and Iris Virginica. Each flower has four features according its sepal and petal characteristic. In our assignment, I used first and fourth feature to fit the data and predict class label.

In the iris dataset, each flower has 50 sample. I grouped all data according to its label and used first 30 sample for training, last 20 sample for test set.

Classification Results

I tried different k values for each distance metrics and k-NN classification accuracies for different k and distance metrics given in Table 1. According to Table 1, Euclidean distance are most useful distance metric to classify our iris dataset, Cosine distance is the worst. In addition, the classifier classify the dataset in most optimal way for all distance metrics while k = 3.

Decision Boundaries

Our training data shown in Figure 1. In this step, I draw four decision boundaries with the following parameters:

  • k=3, distance metric=Euclidean distance
  • k=3, distance metric=Manhattan distance
  • k=3, distance metric=Cosine distance
  • k=1, distance metric=Euclidean distance

Figures for decision boundaries given in Figure 2 to Figure 5, respectively.

                                 Figure 1: Training samples on scatter plot

   Figure 2: Decision boundaries found by the k-NN algorithm (k=3, distance metric=Euclidean distance)

   Figure 3: Decision boundaries found by the k-NN algorithm (k=3, distance metric=Manhattan distance)

   Figure 4: Decision boundaries found by the k-NN algorithm (k=3, distance metric=Cosine distance )

   Figure 5: Decision boundaries found by the k-NN algorithm (k=1, distance metric=Euclidean distance )

knearestneighbors's People

Contributors

remziorak avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.