Machine Learning

Config

Babel

HTML export

Styles

Ansi output

# echo "$x" | ansi2html --inline
echo '<div class="codeblock ansi">'
echo "$x" | ansi2html --inline --light-background
echo "</div>"

About

This document is written in emacs and org-mode, for export to html.

The inline code blocks in this document are executed sequentially in a single python session by org-babel. Usage here is meant to be representative of use during an interactive session.

portfolio.visualize.Plot.embed() produces a representation of the plot for embedding in an html document. For normal use, use save or show instead.

The source of this document and the project are available on GitHub here. If you are viewing this on GitHub, an exported version with output and graphs is available here.

Setup

Install the project dependencies via Poetry

poetry install --no-interaction --ansi

Activate the venv. In emacs you can activate a venv located at ./.venv via

(pyvenv-activate ".venv")

Load iris dataset

import numpy as np
from portfolio.visualize import Plot

from sklearn.datasets import load_iris
iris = load_iris()

Get data for only type 1 and 2 for binary classifiers, and label as -1 or 1

x = iris.data[iris.target > 0]
y = np.where(iris.target[iris.target > 0] == 1, -1, 1)

Models

Perceptron

Example runs from project description

This section contains code equivalent to that in the example run in the project documentation, to show that it meets the specifications.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from portfolio.perceptron import Perceptron, plot_decision_regions, IRIS_OPTIONS
from portfolio.visualize import Plot

Download and parse the dataset

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', **IRIS_OPTIONS)

Extract the first 100 labels (which are the first two types)

y = df.iloc[0:100, 4].values
y

Since we only selected the first two, the classes are either Iris-setosa or Iris-versicolor. Label Iris-setosa as 1 and everything else as 0.

y = np.where(y == 'Iris-setosa', -1, 1)
y

The first and third features are seperable, so select only those for the values in y

X = df.iloc[0:100, [0, 2]].values
X

Plotting the data, we can clearly see that these two features are seperable.

plt.figure()
plt.scatter(X[:50, 0], X[:50, 1], color='red', marker='o', label='setosa')
plt.scatter(X[50:100, 0], X[50:100, 1], color='blue', marker='x', label='versicolor')
plt.xlabel('petal length')
plt.ylabel('sepal length')
plt.legend(loc='upper left')
Plot.embed(plt)

Initialize the perceptron with a learning rate of 0.1 and a maximum of 1000 iterations.

pn = Perceptron(0.1, 1000)

fit runs the perceptron algorithm on the given data. The number of errors per iteration is stored in errors. Since this only took 6 iterations to converge, it dropped out early instead of doing the entire 1000.

pn.fit(X, y)
pn.errors

This data can be seen plotted below.

plt.figure()
plt.plot(range(1, len(pn.errors) + 1), pn.errors, marker='o')
plt.xlabel('Iteration')
plt.ylabel('# of misclassifications')
Plot.embed(plt)

pn.net_input(X)

Running it on the original data in order, we can see that it correctly classifies all of the samples.

pn.predict(X)

pn.weight

Here is a visualization of the computed decision boundary

plot_decision_regions(X, y, pn)
Plot.embed(plt)

The other data I tested against:

Really close data. 701 seems like a really big number, but they are close so I’m assuming it could be right.

X1 = df.iloc[0:100, [0, 1]].values
pn.fit(X1, y)
pn.errors.shape

plot_decision_regions(X1, y, pn)
Plot.embed(plt)

And data with a very wide separation. Here we can see it dropping out after only 3 iterations.

X2 = df.iloc[0:100, [2, 3]].values
pn.fit(X2, y)
pn.errors

plot_decision_regions(X2, y, pn)
Plot.embed(plt)

Close all the figures we opened:

plt.close('all')

Linear Regression

For single dimension data, linear regression is defined as $w = A^-1b$, where A is $xx^\mathsf{T}$, and b is $yx$. Since the matrix may not be invertible, we take the pseudo inverse. Since I added a column of ones to the A matrix, $w_0$ holds the y intercept, and since it is only a single dimension, $w_1$ holds the slope of the line.

The linear regression code may be run on the iris dataset with the following:

from portfolio.linear_regression import LinearRegression

x_sepal_width = x[:, 1]

regression_plot = Plot(x_sepal_width, y)
regression_plot.add_view(LinearRegression)
regression_plot.embed()

Decision Stumps

Decision stumps are a type of weak learner. They consist of a decision tree with only a single node. My implementation only accepts a binary split of classes labeled -1 and 1, although in theory a decision stump can split based on threshold into any number of classes.

I couldn’t figure out how to translate the pseudocode for the efficient $O(dm)$ implementation, so this is the naïve $O(dm^2)$ version. I also have a somewhat fuzzy understanding of the book’s explanation, but this is equivalent as far as I can tell.

Since the decision stump needs a threshold $θ$ in order to classify everything on each side as a different class, we need to define a set of steps. Given that two points may be arbitrarily close, I used the $x$ values from the data to ensure that if there was a single optimal $θ$ it would be selected.

The other part of the decision stump is the dimension of the data against which it will classify. This implementation considers all given dimensions and selects the most suitable one.

The last part is the error function. The error function I used is a count of the differences between the predicted class and the expected class.

For each of the combinations of dimensions and possible thresholds, the decision stump checks the error function. The one with the lowest error is selected as the dimension to classify against and the $θ$ to use as a threshold.

Shown here is the decision boundary, along with the values plotted for the chosen dimension, with the label on the y axis. As you can see, these are not seperable, but the optimal boundary was still found.

from portfolio.decision_stumps import DecisionStumps

x_sepal_width = x[:, 1]

regression_plot = Plot(x, y)
regression_plot.add_view(DecisionStumps)
regression_plot.embed()

Support Vector Machine

Soft SVM

SVMs are a linear classification model . Unlike the perceptron, SVMs optimize for the widest margin between classes.

In addition, since this is a soft-margin svm, it uses a a hinge loss function. This allows it to accept non linearly serperable data, and produce a reasonable classification boundary.

Optimization of this loss function is implemented via a stochastic gradient decent (SGD).

For example, this data from iris is not seperable.

from portfolio import svm

y = df.iloc[50:, 4].values
y = np.where(y == 'Iris-versicolor', -1, 1)
X = df.iloc[50:, [0, 3]].values
np.c_[y, X]

Running the soft svd classifier on it, we get a reasonably accurate classification of the data.

plot = Plot(X, y)
plot.add_view(svm.Soft)
plot.embed()

K-Nearest Neighbor

K nearest neighbors classifier simply classifies a sample based upon the closest training data.

At least in this implementation, although also in general to a lesser extent, KNN tends to be very computationally intensive for large datasets, since it must compute the distance to all points in order to find the best.

from portfolio.knn import KNN

y = df.iloc[50:, 4].values
y = np.where(y == 'Iris-versicolor', -1, 1)
X = df.iloc[50:, [0, 3]].values
np.c_[y, X]

knn = KNN(X, y)

knn.predict([[0, 0]]).item()

knn.predict([[3, 8]]).item()

Given the same data as the SVM example, we can see that KNN is able to classify this data entirely as opposed to the lossy linear classifier. This isn’t necessarily an argument, since it doesn’t take into account the possibility of overfitting, which KNN is always affected by.

Given certain datasets, including this one, the performance is very good.

fig = plt.figure()
visualize.plot_decision_regions(fig.add_subplot(), X, y, knn)
Plot.embed(fig)

briaoeuidhtns / machine-learning Goto Github PK