Code Monkey home page Code Monkey logo

shaikriyazsandy / clustering Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 6.17 MB

Problem Statement Perform clustering (Hierarchical,K means clustering and DBSCAN) for the airlines data to obtain optimum number of clusters. Content This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas

Jupyter Notebook 100.00%
clustering-algorithm data-science dbscan-clustering epsilon-greedy heirarchical-clustering kmeans-clustering

clustering's Introduction

Clustering

Problem Statement Perform clustering (Hierarchical,K means clustering and DBSCAN) for the airlines data to obtain optimum number of clusters

  • Clustering analysis is an unsupervised learning method that separates the data points into several specific bunches or groups, such that the data points in the same groups have similar properties and data points in different groups have different properties in some sense.

  • It comprises of many different methods based on different distance measures. E.g. K-Means (distance between points), Affinity propagation (graph distance), Mean-shift (distance between points), DBSCAN (distance between nearest points), Gaussian mixtures (Mahalanobis distance to centers), Spectral clustering (graph distance), etc.

  • Centrally, all clustering methods use the same approach i.e. first we calculate similarities and then we use it to cluster the data points into groups or batches. Here we will focus on the Density-based spatial clustering of applications with noise (DBSCAN) clustering method.

1) Case Summary

East-West Airlines is trying to learn more about its customers. Key issues are their flying patterns, earning and use of frequent flyer rewards, and use of the airline credit card. The task is to identify customer segments via clustering. The file EastWestAirlines.xls contains information on 4000 passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar charactersitics for the purpose of targeting different segments for different types of mileage offers.

1.1 Data Description:

The file EastWestAirlinescontains information on passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers

  • ID --Unique ID

  • Balance--Number of miles eligible for award travel

  • Qual_mile--Number of miles counted as qualifying for Topflight status

  • cc1_miles -- Number of miles earned with freq. flyer credit card in the past 12 months:

  • cc2_miles -- Number of miles earned with Rewards credit card in the past 12 months:

  • cc3_miles -- Number of miles earned with Small Business credit card in the past 12 months:

  • Note: 1 = under 5,000 2 = 5,000 - 10,000 3 = 10,001 - 25,000 4 = 25,001 - 50,000 5 = over 50,000

  • Bonus_miles--Number of miles earned from non-flight bonus transactions in the past 12 months

  • Bonus_trans--Number of non-flight bonus transactions in the past 12 months

  • Flight_miles_12mo--Number of flight miles in the past 12 months

  • Flight_trans_12--Number of flight transactions in the past 12 months

  • Days_since_enrolled--Number of days since enrolled in flier program

  • Award--whether that person had award flight (free flight) or not

Problem Statement

Perform Clustering(Hierarchical, Kmeans & DBSCAN) for the crime data and identify the number of clusters formed and draw inferences.

Draw the inferences from the clusters obtained.

1. Content

  • This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.This is a systematic approach for identifying and analyzing patterns and trends in crime using USArrest dataset. A data frame with 50 observations on 4 variables.

  • Murder is numeric and Murder arrests (per 100,000)

  • Assault is numeric and Assault arrests (per 100,000)

  • UrbanPop is numeric and UrbanPop arrests (per 100,000)

  • Rape is numeric and Rape arrests (per 100,000)

clustering's People

Contributors

shaikriyazsandy avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.