Coding Challenge
This coding challenge was given to me to implement a KMeans Clustering Algorithm similar to that in sklearn.
For comparison, I first wrote an example and a few graphs with a randomly generated dataset. Then, I attempted to create my own implementation of the KMeans algorithm with comparable results to that of the sklearn results. This assigment was to be completed within one hour.
Here is an example plot of the clusters. It includes centroid values from both my implementation and the sklearn implementation for comparison:
Here is a table to show the differences in the Centroid values from each implementation:
My Implementation | sklearn | |||
---|---|---|---|---|
X | Y | X | Y | |
Centroid 1 | 1.10041362 | 3.05174259 | 1.04798811 | 1.1345495 |
Centroid 2 | 1.00438464 | 1.11955788 | 3.18576148 | 0.9072616 |
Centroid 3 | 3.05447363 | 2.65856700 | 3.05447363 | 2.658567 |
Centroid 4 | 3.15083859 | 0.92733088 | 1.10041362 | 3.05174259 |
To determine the inital number of clusters I also made the following elbow plot: