Code Monkey home page Code Monkey logo

pyclustertend's Introduction

pyclustertend

Build Status PyPi Status Documentation Status Downloads codecov DOI

pyclustertend is a python package specialized in cluster tendency. Cluster tendency consist to assess if clustering algorithms are relevant for a dataset.

Three methods for assessing cluster tendency are currently implemented and one additional method based on metrics obtained with a KMeans estimator :

  • Hopkins Statistics

  • VAT

  • iVAT

  • Metric based method (silhouette, calinksi, davies bouldin)

Installation

    pip install pyclustertend

Usage

Example Hopkins

    >>>from sklearn import datasets
    >>>from pyclustertend import hopkins
    >>>from sklearn.preprocessing import scale
    >>>X = scale(datasets.load_iris().data)
    >>>hopkins(X,150)
    0.18950453452838564

Example VAT

    >>>from sklearn import datasets
    >>>from pyclustertend import vat
    >>>from sklearn.preprocessing import scale
    >>>X = scale(datasets.load_iris().data)
    >>>vat(X)

Example iVat

    >>>from sklearn import datasets
    >>>from pyclustertend import ivat
    >>>from sklearn.preprocessing import scale
    >>>X = scale(datasets.load_iris().data)
    >>>ivat(X)

Notes

It's preferable to scale the data before using hopkins or vat algorithm as they use distance between observations. Moreover, vat and ivat algorithms do not really fit to massive databases. A first solution is to sample the data before using those algorithms.

pyclustertend's People

Contributors

dependabot[bot] avatar lachhebo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pyclustertend's Issues

Package dependency

This package doesn't install on M1 Mac due version 0.24.2 of scikit-learn.
Could we bump the version of scikit-learn (0.24.2 is old and should be updated to something after the major release) ?

Bug?

Hello,

First of all, thank you for implementiv VAT and iVAT! In the function compute_ivat_ordered_dissimilarity_matrix, the iVAT matrix is computed, but it is not symmetric in the end. Is is possible that there is some bug? I fixed it for now with the following code:
I changed
return re_ordered_matrix
to
for i in range(re_ordered_matrix.shape[0]):
for j in range(i):
re_ordered_matrix[j, i] = re_ordered_matrix[i, j]
return re_ordered_matrix

Sincere regards!

Release package to conda-forge

Firstly - great project! Thank you for your great work!

Coming to the issue, I use conda to manage my python dependencies, and it'd be really great if it were possible to release the package on conda-forge, so that it can easily be installed using conda.

Would love to know your thoughts on the same, thanks!

New release

Hello.

I'd like to use your package to show the Hopkins test to my students but it, unfortunately, does not work with the latest scikit-learn. The problem is the old name for calinski_harabaz_score which has been removed from the latest sklearn.

I see that it's already fixed in the master so could you please release version 1.4.9 to PyPI?

Thanks a lot and have a nice day.

hopkins docs

Hello :)

I found some contradictory information on the hopkins test on https://pyclustertend.readthedocs.io/en/latest/

On the top it says:
If the test is positve, (an hopkins score which tends to 0) it means that clustering is useless for the dataset.

In the API section it says:
A score between 0 and 1, a score around 0.5 express no clusterability and a score tending to 0 express a high cluster tendency.

Error when import pyclustertend

I installed pyclustertend using pip install and also using conda install and many other ways, but is always giving the same error while running the code as
code=
import sys
import os.path
import os
import pyclustertend
from sklearn import datasets
from pyclustertend import vat
from sklearn.preprocessing import scale
X = scale(df3)
vat(X)

Error=

ModuleNotFoundError Traceback (most recent call last)
in
2 import os.path
3 import os
----> 4 import pyclustertend
5 from sklearn import datasets
6 from pyclustertend import vat

ModuleNotFoundError: No module named 'pyclustertend'

I tried many ways as given below:-

way 1#
(base) MacBook:~ pinky$ python -m pip install pyclustertend
Requirement already satisfied: pyclustertend in /opt/anaconda3/lib/python3.7/site-packages/pyclustertend-1.4.9-py3.7.egg (1.4.9)

way 2#
(base) MacBook:~ pinky$ pip3 install pyclustertend
Collecting pyclustertend
Using cached pyclustertend-1.4.9-py3-none-any.whl (9.8 kB)
Installing collected packages: pyclustertend
Successfully installed pyclustertend-1.4.9

way 3#
(base) MacBook:~ pinky$ conda install pyclustertend
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  • pyclustertend

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

way4#
(base) MacBook:~ pinky$ pip install pyclustertend==1.4.9
Requirement already satisfied: pyclustertend==1.4.9 in /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages (1.4.9)

But always getting error while ruuning in jupyter notebook as
ModuleNotFoundError: No module named 'pyclustertend'

Does anyone can help me to solve the problem, I am not very efficient in python, Thanks a lot in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.