Code Monkey home page Code Monkey logo

clusteval's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

clusteval's Issues

is pip not working anymore?

I get the following
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnection
Error('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x0000024B20CAB400>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /simple/clusteval/
ERROR: Could not find a version that satisfies the requirement clusteval (from versions: none) ERROR: No matching distribution found for clusteval

AttributeError: 'clusteval' object has no attribute 'results'... DUP of closed issue #5?

I've tried versions from 2.0.0 to beta, all resulting in the same issue as described in Issue 5.

I'm on the latest HDBSCAN version of 0.8.27, and the beta version of Clusteval...

My array is as follows:

umap_rs_embed[:10]

array([[-0.16568227,  2.3830128 ,  0.9952151 ],
       [-0.16470274,  0.91874045,  1.6843276 ],
       [-0.10057875,  1.0044663 ,  4.231984  ],
       [ 7.218489  ,  3.189865  ,  1.6015646 ],
       [ 1.7666751 ,  2.4235313 ,  1.2277056 ],
       [-0.02537769,  1.1624466 ,  4.175513  ],
       [ 1.4869809 , -0.8690608 ,  2.6568232 ],
       [-0.05031788, -0.30832335,  0.93605393],
       [ 1.2532264 ,  1.6826892 ,  0.4620979 ],
       [ 1.3145269 ,  1.3296161 ,  3.9630399 ]], dtype=float32)

And the result of fitting is:

# Import library
from clusteval import clusteval
import hdbscan
# Set the method
ce = clusteval(method='hdbscan')
# Evaluate
results = ce.fit(umap_rs_embed)
AttributeError                            Traceback (most recent call last)
<ipython-input-6-68904b3307b8> in <module>
      5 ce = clusteval(method='hdbscan')
      6 # Evaluate
----> 7 results = ce.fit(umap_rs_embed)

~\Desktop\scripting\apps\trajectory\lib\site-packages\clusteval\clusteval.py in fit(self, X)
    170 
    171         # Compute the dendrogram threshold
--> 172         if (self.cluster!='kmeans') and (self.results['labx'] is not None) and (len(np.unique(self.results['labx']))>1):
    173             # print(self.results['labx'])
    174             max_d, max_d_lower, max_d_upper = _compute_dendrogram_threshold(self.Z, self.results['labx'], verbose=self.verbose)

AttributeError: 'clusteval' object has no attribute 'results'

"Attempt to get argmax of an empty sequence" error when Clustering dbscan

Hi, im having this error when using clusteval with cluster param "dbscan" with TFIDF, this is my code:

vectorizer = TfidfVectorizer(max_df=0.55,min_df=27)
X = vectorizer.fit_transform(grams)

svd = TruncatedSVD(int(X.shape[1] - 1))
normalizer = Normalizer(copy=False)
lsa = make_pipeline(svd, normalizer)
X = lsa.fit_transform(X)

ce = clusteval(cluster='dbscan')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

This is the complete log error:

Traceback (most recent call last):
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/main/init.py", line 82, in
ce.fit(X.toarray())
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/clusteval/clusteval.py", line 153, in fit
self.results = dbscan.fit(X, eps=None, epsres=50, min_samples=0.01, metric=self.metric, norm=True, n_jobs=-1, min_clust=self.min_clust, max_clust=self.max_clust, verbose=self.verbose)
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/clusteval/dbscan.py", line 97, in fit
idx = np.argmax(silscores)
File "<array_function internals>", line 6, in argmax
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1188, in argmax
return _wrapfunc(a, 'argmax', axis=axis, out=out)
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
ValueError: attempt to get argmax of an empty sequence

TypeError: plot() got an unexpected keyword argument 'width'

[HDBSCAN] Estimated number of clusters: 10
[HDBSCAN] Silhouette Coefficient: 0.780
[clusteval] >Fin.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-8b8378daba75> in <module>
      8 ce = clusteval(method='hdbscan')
      9 ce.fit(X)
---> 10 ce.plot()
     11 #ce.scatter(X)

/usr/local/lib/python3.7/site-packages/clusteval/clusteval.py in plot(self, figsize)
    151         elif self.method=='hdbscan':
    152             import clusteval.hdbscan as hdbscan
--> 153             hdbscan.plot(self.results, width=figsize[0], height=figsize[1])
    154 
    155     # Plot

TypeError: plot() got an unexpected keyword argument 'width'

Method and metric

I sense potential in this package and I am inclined to use it in the future. Keep on the good work! Consider publishing the software in the SoftwareX journal.

I have a number of questions that I could not answer from the arguably short documentation:

  • It seems to me that I can either pick DBSCAN or say silhouette score, but not both at the same time. This seems odd to me because DBSCAN is a method whose results could be used with the silhouette score.
  • Related to that question: How are clusters evaluated if I pick DBSCAN or HDBSCAN, and how are clusters computed if I pick silhouette score or the Davies-Boulin index.
  • How could I choose different distance metrics to plug into e.g. DBSCAN or HDBSCAN?
  • How do I see which parameters got chosen?

AttributeError: 'clusteval' object has no attribute 'results'

image
[clusteval] >Fit using agglomerative with metric: euclidean, and linkage: ward

AttributeError Traceback (most recent call last)
in
6
7 # Fit to find optimal number of clusters using dbscan
----> 8 results= ce.fit(X)
9
10 # Make plot of the cluster evaluation

~\anaconda3\lib\site-packages\clusteval\clusteval.py in fit(self, X)
167
168 # Compute the dendrogram threshold
--> 169 if (self.cluster!='kmeans') and (len(np.unique(self.results['labx']))>1):
170 # print(self.results['labx'])
171 max_d, max_d_lower, max_d_upper = _compute_dendrogram_threshold(self.Z, self.results['labx'], verbose=self.verbose)

AttributeError: 'clusteval' object has no attribute 'results'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.