Code Monkey home page Code Monkey logo

parissashahabi / behavioral-data-clustering-and-gender-correlation-analysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.96 MB

Clustered behavioral data into two groups, regardless of gender, and evaluated cluster consistency with gender division using silhouette and Davies-Bouldin scores. Additionally, identified the optimal cluster count using the elbow method and re-evaluated clustering efficacy.

Jupyter Notebook 100.00%
clustering k-means-clustering machine-learning silhouette-score davies-bouldin-score elbow-method

behavioral-data-clustering-and-gender-correlation-analysis's Introduction

Behavioral Data Clustering and Gender Correlation Analysis

Project Overview

This project focuses on analyzing and clustering a dataset based on daily behaviors to investigate the relationship between these behaviors and gender classification. The primary goal is to cluster the data into two groups initially without considering the gender column and then evaluate whether the clustering aligns with the gender classification of the data. The project employs the K-Means algorithm for clustering and assesses the results using silhouette score and Davies-Bouldin score criteria.

Problem Statement

The challenge lies in determining the consistency of clustering with the gender classification and evaluating the clustering quality. The project also explores the optimal number of clusters using the elbow method for the K-Means algorithm and re-evaluates the clustering with the new cluster count.

Desired Outcomes

The project involves the following key steps:

  1. Data Analysis and Pre-processing: Initial exploration and preparation of the data for clustering.
  2. Clustering Model Development: Implementing the K-Means algorithm for data clustering.
  3. Evaluation of Clustering: Assessing the clustering results using silhouette score and Davies-Bouldin score.
  4. Optimization of Cluster Count: Determining the optimal number of clusters and re-evaluating the clustering.
  5. Detailed Documentation: Each step, including the rationale and results, is thoroughly documented in a PDF file.

Repository Structure

  • HW2-2.ipynb: Jupyter notebook containing the entire analysis and clustering process.
  • Q2.csv: The dataset used for the analysis.
  • Report.pdf: A PDF file containing a detailed report of the analysis, results, and evaluations.

Key Results

  • The notebook includes a diagram comparing the clustering results with the actual gender classification of the data, highlighting the accuracy and effectiveness of the clustering.

    Clustering Results

  • Detailed evaluation of the clustering results using silhouette score and Davies-Bouldin score.

  • Discussion on the optimal number of clusters and the re-evaluation of the clustering with this new cluster count.

    Elbow method for optimal k

How to Use

  • Clone the repository.
  • Ensure you have Jupyter Notebook installed along with required libraries: Numpy, pandas, matplotlib, seaborn, plotly, sklearn.
  • Run HW2-2.ipynb to view the analysis and results.

behavioral-data-clustering-and-gender-correlation-analysis's People

Contributors

parissashahabi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.