erdogant / clusteval Goto Github PK

Clusteval provides methods for unsupervised cluster validation

Home Page: https://erdogant.github.io/clusteval

License: Other

Python 0.87% Shell 0.01% Jupyter Notebook 99.12%

clustering unsupervised-clustering silhouette-method dbindex density-based-clustering validation machine-learning python

clusteval's People

Stargazers

Watchers

Forkers

eybesh shalevy1 aniruddhachoudhury jedsada-gh ricciardi tdl77 matthew-j-payne customeriq

clusteval's Issues

is pip not working anymore?

I get the following
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnection
Error('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x0000024B20CAB400>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /simple/clusteval/
ERROR: Could not find a version that satisfies the requirement clusteval (from versions: none) ERROR: No matching distribution found for clusteval

AttributeError: 'clusteval' object has no attribute 'results'... DUP of closed issue #5?

I've tried versions from 2.0.0 to beta, all resulting in the same issue as described in Issue 5.

I'm on the latest HDBSCAN version of 0.8.27, and the beta version of Clusteval...

My array is as follows:

umap_rs_embed[:10]

array([[-0.16568227,  2.3830128 ,  0.9952151 ],
       [-0.16470274,  0.91874045,  1.6843276 ],
       [-0.10057875,  1.0044663 ,  4.231984  ],
       [ 7.218489  ,  3.189865  ,  1.6015646 ],
       [ 1.7666751 ,  2.4235313 ,  1.2277056 ],
       [-0.02537769,  1.1624466 ,  4.175513  ],
       [ 1.4869809 , -0.8690608 ,  2.6568232 ],
       [-0.05031788, -0.30832335,  0.93605393],
       [ 1.2532264 ,  1.6826892 ,  0.4620979 ],
       [ 1.3145269 ,  1.3296161 ,  3.9630399 ]], dtype=float32)

And the result of fitting is:

# Import library
from clusteval import clusteval
import hdbscan
# Set the method
ce = clusteval(method='hdbscan')
# Evaluate
results = ce.fit(umap_rs_embed)

AttributeError                            Traceback (most recent call last)
<ipython-input-6-68904b3307b8> in <module>
      5 ce = clusteval(method='hdbscan')
      6 # Evaluate
----> 7 results = ce.fit(umap_rs_embed)

~\Desktop\scripting\apps\trajectory\lib\site-packages\clusteval\clusteval.py in fit(self, X)
    170 
    171         # Compute the dendrogram threshold
--> 172         if (self.cluster!='kmeans') and (self.results['labx'] is not None) and (len(np.unique(self.results['labx']))>1):
    173             # print(self.results['labx'])
    174             max_d, max_d_lower, max_d_upper = _compute_dendrogram_threshold(self.Z, self.results['labx'], verbose=self.verbose)

AttributeError: 'clusteval' object has no attribute 'results'

Small question on recommanded usage

This was an error

"Attempt to get argmax of an empty sequence" error when Clustering dbscan

Hi, im having this error when using clusteval with cluster param "dbscan" with TFIDF, this is my code:

vectorizer = TfidfVectorizer(max_df=0.55,min_df=27)
X = vectorizer.fit_transform(grams)

svd = TruncatedSVD(int(X.shape[1] - 1))
normalizer = Normalizer(copy=False)
lsa = make_pipeline(svd, normalizer)
X = lsa.fit_transform(X)

ce = clusteval(cluster='dbscan')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

This is the complete log error:

Traceback (most recent call last):
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/main/init.py", line 82, in
ce.fit(X.toarray())
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/clusteval/clusteval.py", line 153, in fit
self.results = dbscan.fit(X, eps=None, epsres=50, min_samples=0.01, metric=self.metric, norm=True, n_jobs=-1, min_clust=self.min_clust, max_clust=self.max_clust, verbose=self.verbose)
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/clusteval/dbscan.py", line 97, in fit
idx = np.argmax(silscores)
File "<array_function internals>", line 6, in argmax
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1188, in argmax
return _wrapfunc(a, 'argmax', axis=axis, out=out)
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
ValueError: attempt to get argmax of an empty sequence

S_Dbw index

Hi Erdogan,

would it be possible for the clustering evaluation to integrate an S_Dbw index https://pypi.org/project/s-dbw/#description, https://github.com/alashkov83/S_Dbw?

Best regards,
Nataliia

How to save the plots?

Charts are well generated but how to save them to local folder?

TypeError: plot() got an unexpected keyword argument 'width'

[HDBSCAN] Estimated number of clusters: 10
[HDBSCAN] Silhouette Coefficient: 0.780
[clusteval] >Fin.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-8b8378daba75> in <module>
      8 ce = clusteval(method='hdbscan')
      9 ce.fit(X)
---> 10 ce.plot()
     11 #ce.scatter(X)

/usr/local/lib/python3.7/site-packages/clusteval/clusteval.py in plot(self, figsize)
    151         elif self.method=='hdbscan':
    152             import clusteval.hdbscan as hdbscan
--> 153             hdbscan.plot(self.results, width=figsize[0], height=figsize[1])
    154 
    155     # Plot

TypeError: plot() got an unexpected keyword argument 'width'

Method and metric

I sense potential in this package and I am inclined to use it in the future. Keep on the good work! Consider publishing the software in the SoftwareX journal.

I have a number of questions that I could not answer from the arguably short documentation:

It seems to me that I can either pick DBSCAN or say silhouette score, but not both at the same time. This seems odd to me because DBSCAN is a method whose results could be used with the silhouette score.
Related to that question: How are clusters evaluated if I pick DBSCAN or HDBSCAN, and how are clusters computed if I pick silhouette score or the Davies-Boulin index.
How could I choose different distance metrics to plug into e.g. DBSCAN or HDBSCAN?
How do I see which parameters got chosen?

AttributeError: 'clusteval' object has no attribute 'results'

[clusteval] >Fit using agglomerative with metric: euclidean, and linkage: ward

AttributeError Traceback (most recent call last)
in
6
7 # Fit to find optimal number of clusters using dbscan
----> 8 results= ce.fit(X)
9
10 # Make plot of the cluster evaluation

~\anaconda3\lib\site-packages\clusteval\clusteval.py in fit(self, X)
167
168 # Compute the dendrogram threshold
--> 169 if (self.cluster!='kmeans') and (len(np.unique(self.results['labx']))>1):
170 # print(self.results['labx'])
171 max_d, max_d_lower, max_d_upper = _compute_dendrogram_threshold(self.Z, self.results['labx'], verbose=self.verbose)

AttributeError: 'clusteval' object has no attribute 'results'

erdogant / clusteval Goto Github PK

clusteval's People

Stargazers

Watchers

Forkers

clusteval's Issues

is pip not working anymore?

AttributeError: 'clusteval' object has no attribute 'results'... DUP of closed issue #5?

Small question on recommanded usage

"Attempt to get argmax of an empty sequence" error when Clustering dbscan

S_Dbw index

How to save the plots?

TypeError: plot() got an unexpected keyword argument 'width'

Method and metric

AttributeError: 'clusteval' object has no attribute 'results'

[clusteval] >Fit using agglomerative with metric: euclidean, and linkage: ward

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent