erdogant / clusteval Goto Github PK
View Code? Open in Web Editor NEWClusteval provides methods for unsupervised cluster validation
Home Page: https://erdogant.github.io/clusteval
License: Other
Clusteval provides methods for unsupervised cluster validation
Home Page: https://erdogant.github.io/clusteval
License: Other
I get the following
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnection
Error('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x0000024B20CAB400>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /simple/clusteval/
ERROR: Could not find a version that satisfies the requirement clusteval (from versions: none) ERROR: No matching distribution found for clusteval
I've tried versions from 2.0.0 to beta, all resulting in the same issue as described in Issue 5.
I'm on the latest HDBSCAN version of 0.8.27, and the beta version of Clusteval...
My array is as follows:
umap_rs_embed[:10]
array([[-0.16568227, 2.3830128 , 0.9952151 ],
[-0.16470274, 0.91874045, 1.6843276 ],
[-0.10057875, 1.0044663 , 4.231984 ],
[ 7.218489 , 3.189865 , 1.6015646 ],
[ 1.7666751 , 2.4235313 , 1.2277056 ],
[-0.02537769, 1.1624466 , 4.175513 ],
[ 1.4869809 , -0.8690608 , 2.6568232 ],
[-0.05031788, -0.30832335, 0.93605393],
[ 1.2532264 , 1.6826892 , 0.4620979 ],
[ 1.3145269 , 1.3296161 , 3.9630399 ]], dtype=float32)
And the result of fitting is:
# Import library
from clusteval import clusteval
import hdbscan
# Set the method
ce = clusteval(method='hdbscan')
# Evaluate
results = ce.fit(umap_rs_embed)
AttributeError Traceback (most recent call last)
<ipython-input-6-68904b3307b8> in <module>
5 ce = clusteval(method='hdbscan')
6 # Evaluate
----> 7 results = ce.fit(umap_rs_embed)
~\Desktop\scripting\apps\trajectory\lib\site-packages\clusteval\clusteval.py in fit(self, X)
170
171 # Compute the dendrogram threshold
--> 172 if (self.cluster!='kmeans') and (self.results['labx'] is not None) and (len(np.unique(self.results['labx']))>1):
173 # print(self.results['labx'])
174 max_d, max_d_lower, max_d_upper = _compute_dendrogram_threshold(self.Z, self.results['labx'], verbose=self.verbose)
AttributeError: 'clusteval' object has no attribute 'results'
This was an error
Hi, im having this error when using clusteval with cluster param "dbscan" with TFIDF, this is my code:
vectorizer = TfidfVectorizer(max_df=0.55,min_df=27)
X = vectorizer.fit_transform(grams)
svd = TruncatedSVD(int(X.shape[1] - 1))
normalizer = Normalizer(copy=False)
lsa = make_pipeline(svd, normalizer)
X = lsa.fit_transform(X)
ce = clusteval(cluster='dbscan')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()
This is the complete log error:
Traceback (most recent call last):
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/main/init.py", line 82, in
ce.fit(X.toarray())
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/clusteval/clusteval.py", line 153, in fit
self.results = dbscan.fit(X, eps=None, epsres=50, min_samples=0.01, metric=self.metric, norm=True, n_jobs=-1, min_clust=self.min_clust, max_clust=self.max_clust, verbose=self.verbose)
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/clusteval/dbscan.py", line 97, in fit
idx = np.argmax(silscores)
File "<array_function internals>", line 6, in argmax
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1188, in argmax
return _wrapfunc(a, 'argmax', axis=axis, out=out)
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
ValueError: attempt to get argmax of an empty sequence
Hi Erdogan,
would it be possible for the clustering evaluation to integrate an S_Dbw index https://pypi.org/project/s-dbw/#description, https://github.com/alashkov83/S_Dbw?
Best regards,
Nataliia
Charts are well generated but how to save them to local folder?
[HDBSCAN] Estimated number of clusters: 10
[HDBSCAN] Silhouette Coefficient: 0.780
[clusteval] >Fin.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-35-8b8378daba75> in <module>
8 ce = clusteval(method='hdbscan')
9 ce.fit(X)
---> 10 ce.plot()
11 #ce.scatter(X)
/usr/local/lib/python3.7/site-packages/clusteval/clusteval.py in plot(self, figsize)
151 elif self.method=='hdbscan':
152 import clusteval.hdbscan as hdbscan
--> 153 hdbscan.plot(self.results, width=figsize[0], height=figsize[1])
154
155 # Plot
TypeError: plot() got an unexpected keyword argument 'width'
I sense potential in this package and I am inclined to use it in the future. Keep on the good work! Consider publishing the software in the SoftwareX journal.
I have a number of questions that I could not answer from the arguably short documentation:
AttributeError Traceback (most recent call last)
in
6
7 # Fit to find optimal number of clusters using dbscan
----> 8 results= ce.fit(X)
9
10 # Make plot of the cluster evaluation
~\anaconda3\lib\site-packages\clusteval\clusteval.py in fit(self, X)
167
168 # Compute the dendrogram threshold
--> 169 if (self.cluster!='kmeans') and (len(np.unique(self.results['labx']))>1):
170 # print(self.results['labx'])
171 max_d, max_d_lower, max_d_upper = _compute_dendrogram_threshold(self.Z, self.results['labx'], verbose=self.verbose)
AttributeError: 'clusteval' object has no attribute 'results'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.