A basic K-Nearest Neighbors Classification Algorithm using Go by kamil5b
The inputs for this package are a dataset that contain a classification column and the attributes columns and an unclassified data input based on the dataset cases
type KNNData struct {
Classification string
Attributes []float64
Distance float64
}
For each data row in a dataset there will be a classification column and the attributes columns and they are casted to the KNNData type. Each attributes have to be casted to float64 before cast it to the KNNData Attributes.
Each data row will be stored in the KNNData array.
The input is an unclassified data containing attributes that similar with or based on the dataset that we've worked on.
Calculate the distance of the input data with the rest of data in the dataset using :
- Euclidean Distance
- Manhattan Distance
- Minkowski Distance
- Chebyshev/Supremum Distance
- Cosine Distance
- Jaccard Distace
Euclidean distance is the length of a line segment between the two points.
Manhattan distance is
Minkowski distance is
Chebyshev a.k.a Supremum distance is
Cosine distance is
Jaccard distance is
To use this package, you have to download and install this repository.
Write this to your terminal :
go get github.com/kamil5b/knnGo
If you want to use this package to your code you also need to import it
import github.com/kamil5b/knnGo
In order this package work, the dataset you are using must be converted to an array of KNNData where the classification is a string and the attributes are float64. (if your attributes data is ranked enum data, you may convert it to an integer array that contains the ranking of the enum which then convert it to discrete float64)
type KNNData struct {
Classification string
Attributes []float64
Distance float64
}
func KNNClassification(k int, dataset []KNNData, inputAttributes []float64, distance string, p int) (KNNData, error)
This is the main function for this package. This function returning a KNNData containing the classification of the input data and for the attributes are the input data attribute itself.
- k is a value for how much neighbor we observe
- dataset in this parameter is an array of KNNData which is converted from dataset that we had
- inputAttributes is an array of float64 that contains the input attributes
- distance is an enum that converted into non-case-sensitive string. The available distances are :
- p is an integer for minkowski's distance order or the p value. the value doesn't matter if you are not choosing minkowski distance
This function will calculate the distance of the input data with every data in the duplicated dataset and sort ascending based on the distances and then pick the top-k of it and then the classification for the data input is the most populated class among the top-k.
func (d KNNData) PrintKNNData()
This function is for printing the KNNData
func ValueVote(arr []string) string
Functions in this file containing the metric distances that can be used in this package, it return an array of KNNData that each data containing the result of the distances.
func EuclideanDistance
func EuclideanDistance(dataset []KNNData, inputAttributes []float64) ([]KNNData, error)
func ManhattanDistance
func ManhattanDistance(dataset []KNNData, inputAttributes []float64) ([]KNNData, error)
func MinkowskiDistance
func MinkowskiDistance(dataset []KNNData, inputAttributes []float64, p int) ([]KNNData, error)
func ChebyshevDistance
func MinkowskiDistance(dataset []KNNData, inputAttributes []float64) ([]KNNData, error)
func SupremumDistance
func MinkowskiDistance(dataset []KNNData, inputAttributes []float64) ([]KNNData, error)
func CosineDistance
func MinkowskiDistance(dataset []KNNData, inputAttributes []float64) ([]KNNData, error)
func JaccardDistance
func MinkowskiDistance(dataset []KNNData, inputAttributes []float64) ([]KNNData, error)