k-means-without-libraries's Introduction

K-Means-without-ML-libraries

Dataset Description

Provided dataset consists total of 150 samples divided into two files irirs_train.csv and irirs_test.csv having 130 and 20 samples, respectively. As data set is iris flower, I assumed the column names as:

Column 1 - Sepal Length in cm Column 2 - Sepal Width in cm Column 3 - Petal Length in cm Column 4 - Petal Width in cm Column 5 – Species: Iris-Setosa, Iris-Versicolor and Iris-Virginica

Findings

Data Cleaning and normalization: No Missing value or Null value found in input dataset. Calculated min-max normalization scaler to normalize data before passing to algorithm.
Correlation Analysis: Outcomes of Correlation analysis: • Setosa petal lengths and widths are much smaller than Versicolor and Virginica. • Strong linear relationship between all the variables except sepal width, which is much weaker and negative. The below table identifies trends between variables. Depending on strength of the relationship, it assigns a number between -1 and 1.•

Looking at the below correlation table, we can see that there are 3 main variables (sepal length, petal length and petal width) that have a strong linear relationship with species_id. These variables are likely to be strong variables in predicting the species of a given data.

Recommend Projects

sagardatascientists / k-means-without-libraries Goto Github PK

k-means-without-libraries's Introduction

K-Means-without-ML-libraries

Dataset Description

k-means-without-libraries's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent