Code Monkey home page Code Monkey logo

cluster-new's Introduction

License: MIT

Clustering Algorithms

We study frequently used clustering algorithms in this exercise.

Objectives

To study:

  1. Python openml
  2. Data preparation
  3. Clustering algorithms
  4. Performance measures
  5. Hyper-parameter tuning
  6. joblib library
  7. Data visualisation

Branch

Make sure you know your branch of this module.
We continue to improve the modules based on your feedbacks and submitted outputs. Therefore, we create a branch for you when we assign the module.
Go through the README.md of your branch.

ARE YOU READING THE RIGHT BRANCH?

If there is any doubt contact DataDisca.

This code is hosted in a private repository to regulate access. After completing the module, you can host your work in an open repository under MIT license.

Please help us by reporting all type of errors.

License

This code is hosted in a private repository to regulate access. You can share your data & code under MIT license.

Datasets:

Dataset Instances Classes Missing Values URL
iris 150 3 0 https://www.openml.org/d/61
wine 178 3 0 https://www.openml.org/d/187
glass 214 6 0 https://www.openml.org/d/41
haberman 306 2 0 https://www.openml.org/d/43
libras_move 360 15 0 https://www.openml.org/d/299
satellite_image 6435 6 0 https://www.openml.org/d/294
isolet 7797 26 0 https://www.openml.org/d/300
nursery 12960 5 0 https://www.openml.org/d/26
gas-drift-different-concentrations 13910 6 0 https://www.openml.org/d/1477
MagicTelescope 19020 2 0 https://www.openml.org/d/1120
letter 20000 26 0 https://www.openml.org/d/6
covertype 581012 7 0 https://www.openml.org/d/150

Instructions

Follow the steps given below.

Step 1

Study the following algorithms.

  1. K-Means
  2. Agglomerative
  3. DBScan
  4. Optics
  5. Gaussian mixtures
  6. Affinity propagation
  7. Mean-shift
  8. Spectral
  9. Ward hierarchical
  10. Birch
  11. Self organising maps

At the end of the excercise, you should be able to answer the following questions.

  1. What are the important parameters in each algorithm?
  2. How and why those parameters affect the results of respective algorithms?

Step 2

Follow the steps given below to write your code.

  1. In your code, download a dataset using Python openml package
  2. Prepare data
    1. Identify the data types: boolean/categorical, ordinal, numeric in this case. But there can be many other types as well.
    2. Transform categorical variable to numeric as necessary
    3. Min-max normalise
  3. Write a joblib code to walk through the parameter grid
  4. Record f1_score, adjusted_rand_score , silhouette_score and execution time against each parameter combination identified in Step 1.
  5. Save the results to CSV files.

Step 3

  1. Execute your code over all the algorithms and all the datasets.
  2. Save your results to CSV files.

Step 4

  1. Create Tableau Dashboards or Plotly visualisations to analyse your results.
  2. With visualisations:
    1. for each dataset compare and contrast results produced by each algorithm under optimal parameter settings
    2. for each given algorithm, how f1_score, adjusted_rand_score and silhouette_score vary with the different parameter values
    3. Discuss the execution times of algorithms and their parameter settings?
  3. What are the most important parameters in each algorithm
  4. Create a presentation or a pdf document or a Jupyter Notebook explaining the theory and applications of f1_score, adjusted_rand_score and silhouette_score.

Quality Standard of Your Work

  1. Code should follow PEP8 Standard
  2. Host your code on your GitHub in a public or private repository as you prefer.
  • If it is a public repository, send the link for us to evaluate.

  • If it is a private repository, share (view only) with our GitHub usernames. Please contact us for them.

    Send us a notification to start the evaluation. We evaluate your code for your technical progress.

Sponsor

DataDisca Pty Ltd, Melbourne, Australia

https://www.datadisca.com

very interting

cluster-new's People

Contributors

methmal1997 avatar

Watchers

 avatar

Forkers

sachisamadhi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.