deads / scipy-cluster Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 952 KB

Automatically exported from code.google.com/p/scipy-cluster

License: Other

Shell 0.18% Makefile 0.09% Python 64.97% C 32.54% C++ 2.24%

scipy-cluster's People

Watchers

scipy-cluster's Issues

running centroid, ward, and median with result from pdist

These algorithms operate on dissimilarity matrices as long as they are
Euclidean. The centroid, ward, and median linkage functions should be
modified to support this feature.

Damian

Original issue reported on code.google.com by [email protected] on 30 May 2008 at 2:14

Custom labels not working

What steps will reproduce the problem?
R = dendrogram(Z, labels = ['test' for i in range(0,150)])

/usr/lib64/python2.4/site-packages/hcluster/cluster.pyc in
_append_singleton_leaf_node(Z, p, n, level, lvs, ivl, leaf_label_func, i,
labels)
   2532             # for the leaf nodes, use it.
   2533             if labels is not None:
-> 2534                 ivl.append(labels[i-n])
   2535             else:
   2536                 # Otherwise, use the id as the label for the leaf.x

TypeError: list indices must be integers

Solution:
Change row 2534 to:
ivl.append(labels[int(i-n)])

Original issue reported on code.google.com by [email protected] on 18 Feb 2008 at 1:46

C implementation of jaccard incorrect.

What steps will reproduce the problem?
`pdist(X, 'jaccard')`

What is the expected output? What do you see instead?
Compared with MATLAB's output, there are many differences. The alternative
Python implementation `pdist(X, 'test_jaccard')` generates the correct output.

I need to copy over the code from the Scipy repository and generate a new
release.

Original issue reported on code.google.com by [email protected] on 18 Apr 2008 at 3:04

TypeError when running pdist

I'm running hcluster v0.2.0 under linux (CentOS5) with python 2.4.3 and
numpy version: 1.0.1.

I have 6 observations that each comprise ~45000 datapoint (i.e., a 6x45000
numpy array). I want to compute the euclidean distances between the 6
observations. When I try:

dists=pdist(data, 'seuclidean')

I encounter the following error:

File "...lib/python2.4/site-packages/hcluster/distance.py", line 1151, in pdist
VV = np.var(X, axis=0, ddof=1)
TypeError: var() got an unexpected keyword argument 'ddof'

Any ideas?

Original issue reported on code.google.com by [email protected] on 5 Jan 2010 at 5:21

Can you build Windows installer for Python 2.6 & 2.7

I can "python setup.py install", but 

>>> import hcluster
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "hcluster\__init__.py", line 1, in <module>
    import hierarchy as _h
  File "hcluster\hierarchy.py", line 198, in <module>
    import _hierarchy_wrap, types
ImportError: No module named _hierarchy_wrap
>>> from hcluster import pdist, linkage, dendrogram
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "hcluster\__init__.py", line 1, in <module>
    import hierarchy as _h
  File "hcluster\hierarchy.py", line 198, in <module>
    import _hierarchy_wrap, types
ImportError: No module named _hierarchy_wrap

Original issue reported on code.google.com by [email protected] on 19 Jan 2013 at 7:57

easy_install broken on Snow Leopard

What steps will reproduce the problem?
1.easy_install hcluster

What is the expected output? What do you see instead?
Expected output is one nicely installed hcluster module.

What actually happens is:
cc1: error: unrecognized command line option "-Wno-long-double"

What version of the product are you using? On what operating system?
hcluster-0.2.0.zip
i686-apple-darwin10-gcc-4.2.1
OS X Snow Leopard

Original issue reported on code.google.com by [email protected] on 6 Nov 2009 at 10:02

Using custom color list

What steps will reproduce the problem?
1. R = dendrogram(Z, color_list=['brown' for i in range(0,150)])
2. All edges in dendrogram are just green

Solution ('hack'):
Line: 2474
if color_list is None:
    color_list=[]

Colors are still added within the program, but custom colors are used
first. Maybe there is a better solution for this

Original issue reported on code.google.com by [email protected] on 18 Feb 2008 at 1:49

Windows python 2.7 version availability?

Any chance an update can be made for python 2.7 on windows? I tried to compile 
from source but was unable to do so.

Original issue reported on code.google.com by [email protected] on 16 Jan 2013 at 5:59

is_XXX predicates and numobs_XXX functions have bugs

The parameter names that were changed were not changed throughout the call
tree. This has been fixed. These functions have been fixed.

Original issue reported on code.google.com by [email protected] on 29 May 2008 at 9:35

wrong names of required packages for Ubuntu Gutsy


apt-get install scipy
apt-get install matplotlib

do not work on Ubuntu 7.10 (Gutsy Gibbon) with the default
/etc/apt/sources.list


To fix it:
apt-get install python-scipy
apt-get install python-matplotlib

Best

Darek Kedra

Original issue reported on code.google.com by [email protected] on 19 May 2008 at 3:28

Feature request

Could you provide data export functions for tress? 
Something like the Newick format or some other supported by standalone tre
drawing programs.

Original issue reported on code.google.com by fccoelho on 23 Jun 2009 at 10:22

Broken Links to Tutorial and API Documentation

The links to the tutorial and API docs at http://users.soe.ucsc.edu/~eads are 
broken. This significantly increases the learning curve for this library. Could 
these please be moved to the google code site?

Original issue reported on code.google.com by [email protected] on 8 Apr 2011 at 1:40

patch for duplicate directories in sys.path

What steps will reproduce the problem?
1. python setup.py build with duplicate directories in sys.path


What is the expected output? What do you see instead?
$ python setup.py build
There are several valid include directories containing numpy/arrayobject.h


Traceback (most recent call last):
  File "setup.py", line 36, in <module>
    s = input('Selection [default=1]:' % s)
TypeError: not all arguments converted during string formatting

What version of the product are you using? On what operating system?
Mac OSX

Please provide any additional information below.

adding the following line to setup.py avoids this issue:

valid_paths = dict(map(lambda i: (i,1),valid_paths)).keys()

Original issue reported on code.google.com by [email protected] on 10 Oct 2008 at 5:44

support for opencv C matrix

Since the core code is in C,
will it be possible to create an interface for an openCV matrix?

Thanks

Original issue reported on code.google.com by [email protected] on 16 Oct 2010 at 4:04

flat cluster numbering convention

The fcluster function return clusters numbered 1..n. Python counts from 0
and so does scipy.vq.kmeans2.

If scipy-cluster will be included in scipy and vq is there to stay it would
be nice to unify the numbering.

Marcin

Original issue reported on code.google.com by [email protected] on 23 Mar 2008 at 11:52

Snow Leopard breaks hcluster?

On a Mac OSX 10.6 (snow leopard) machine, with Numpy and other stuff updated 
and new, 
importing hcluster leads to a segmentation fault.  Re-installing did not help.  
Any advice?

Thx!

Original issue reported on code.google.com by [email protected] on 13 Sep 2009 at 6:50

bug in dendogram

Hi, 

try the following code, it yields weird results, 
see the attached files.

Best regards,
Petr Danecek


---------------------------------------------------
from pylab import *
from hcluster import pdist, linkage, dendrogram
import numpy
from numpy.random import rand

Y = [174,181,218,150,199,205,119,212,121,148]
for i in range(len(Y)):
    Y[i] = (500-Y[i])/500.

Z = linkage(Y,method='complete')
dendrogram(Z)
print Z
savefig('_test-complete.png')

Z = linkage(Y,method='average')
dendrogram(Z)
print Z
savefig('_test-average.png')

Z = linkage(Y,method='weighted')
dendrogram(Z)
print Z
savefig('_test-weighted.png')

Z = linkage(Y,method='single')
dendrogram(Z)
print Z
savefig('_test-single.png')

---------------------------------------------------

Original issue reported on code.google.com by [email protected] on 15 Jan 2009 at 5:34

Attachments:

setup.py: return with non-zero code when failing

$ python setup.py install --root=installroot/
No paths in the python path contain numpy/arrayobject.h
$ echo $?
0
$

The exit code should be non-zero .. this affects the behavior of programs
(ones that automate building of Python packages, for instance) that rely on
proper exit codes of setup.py.

The culprit is:

valid_paths = filter(contains_arrayobject_h, sys.path)
if len(valid_paths) == 0:
    print "No paths in the python path contain numpy/arrayobject.h"
    sys.exit(0) <--- change this to 1, or just 'raise RuntimeError'

Original issue reported on code.google.com by [email protected] on 1 Jun 2009 at 5:58

cluster.c includes malloc.h but should include stdlib.h

Running python setup.py:
hcluster/cluster.c:90:20: error: malloc.h: No such file or directory

Changing line 90 to 

#include <stdlib.h> 

and it compiles nicely. Still have to test the whole functionality though ;)

BTW I'm on Mac OS 10.5.2 but this should apply to all recent version.
See also: http://developer.apple.com/technotes/tn2002/tn2071.html

Kind regards,
Daniel

Original issue reported on code.google.com by [email protected] on 22 Mar 2008 at 3:45

colothreshold not working

Instead use 'color_threshold' unlike shown in the tutorial.

Good tool, very useful.

Thanks

Original issue reported on code.google.com by [email protected] on 23 Nov 2009 at 6:41

incorrect docs in centroid, median, ward

Loki Davison noted that the docs say the centroid, median, and ward linkage
functions can take condensed distance matrices to do the linkage. This is
not correct and corrections to these docs will appear in the next release.

Damian

Original issue reported on code.google.com by [email protected] on 23 May 2008 at 1:18

Mention on the front page that scipy-cluster is part of scipy

It would be good to add a prominent note to the front page that the package is 
available as a part of scipy.

Some users may find installing Scipy a better option, because it is commonly 
available pre-packaged and more actively maintained.

Original issue reported on code.google.com by [email protected] on 17 Jun 2011 at 12:55

Canberra distance broken

Using hcluster.pdist with the Canberra distance does not work as the input
matrix is converted to bool instead of double.

See the patch for a fix.

P.S. Thanks for a great package.

Original issue reported on code.google.com by [email protected] on 1 Aug 2008 at 12:03

Attachments:

patch.txt

Segmentation fault python 2.6.5

What steps will reproduce the problem?
1.python -v
2.import hcluster


What do you see instead?
dlopen("/home/jgeiss/.local/lib/python2.6/site-packages/_hierarchy_wrap.so", 2);
Speicherzugriffsfehler (Segmentation fault)


What version of the product are you using? On what operating system?
hcluster0.2.0
python 2.6.5
Ubuntu 10.04.4 LTS
numpy 1.3.0
scipy 0.7.0
matplotlib 0.99.1.1

Please provide any additional information below.
it works with 
hcluster0.2.0,  
python 2.7.3 
ubuntu 12.04
numpy 1.6.1
scipy 0.9.0
matplotlib 1.1.1rc

DO I need to update any of the other packages?

Thanks,
Johanna

Original issue reported on code.google.com by [email protected] on 13 Jan 2015 at 7:59

passing boolean matrices to pdist for boolean distance functions fails

What steps will reproduce the problem?

`pdist(X, 'hamming')`
`pdist(X, 'jaccard')`
`pdist(X, 'any_implemented_boolean_distance_matrix')`

What is the expected output? What do you see instead?

An exception is generated.

Original issue reported on code.google.com by [email protected] on 18 Apr 2008 at 3:06

Correlation distance errors, confusion and possible fix

What steps will reproduce the problem?

import numpy
import hcluster

x1 = numpy.random.randn(10,)
x2 = numpy.random.randn(10,)
xx = numpy.vstack((x1, x2))

# first error
hcluster.correlation(x1, x2)

# second error
hcluster.pdist(xx, 'correlation')


What is the expected output? What do you see instead?
I expected 1-pearson correlation coeff. 

Error #1
usr/lib/python2.5/site-packages/hcluster/cluster.py in correlation(u, v)
    830     vm = v - vmu
    831     return 1.0 - (scipy.dot(um, vm.T) / (math.sqrt(scipy.dot(um,
vm).T)) \
--> 832             * math.sqrt(scipy.dot(vm, vm.T)))

Error #2
usr/lib/python2.5/site-packages/hcluster/cluster.py in pdist(X, metric, p,
V, VI)
   1372             dm = squareform(dm)
   1373         elif mstr in set(['correlation', 'co']):
-> 1374             X2 = X - numpy.repmat(numpy.mean(X, axis=1).reshape(m,
1), 1, n)
   1375             norms = numpy.sqrt(numpy.sum(X2 * X2, axis=1))
   1376             _cluster_wrap.pdist_cosine_wrap(X2, dm, norms)

<type 'exceptions.AttributeError'>: 'module' object has no attribute 'repmat'


What version of the product are you using? On what operating system?
Python2.5, numpy1.0.5.dev, hcluster (current svn), linux, 32


Please provide any additional information below.

I dont really get the documentation with the manhatten norm and all :-), Im
just assuming 1-pers.corr.coeff. If thats right, here is my fix (diff):

- cluster.py  (revision 90)
+++ cluster.py  (working copy)
@@ -828,8 +828,8 @@
     umu = u.mean()
     um = u - umu
     vm = v - vmu
-    return 1.0 - (scipy.dot(um, vm.T) / (math.sqrt(scipy.dot(um, vm).T)) \
-            * math.sqrt(scipy.dot(vm, vm.T)))
+    return 1.0 - (scipy.dot(um, vm.T) / ((math.sqrt(scipy.dot(um, um.T))) \
+            * math.sqrt(scipy.dot(vm, vm.T))))

 def hamming(u, v):
     """
@@ -1371,7 +1371,7 @@
             dm[xrange(0,m),xrange(0,m)] = 0
             dm = squareform(dm)
         elif mstr in set(['correlation', 'co']):
-            X2 = X - numpy.repmat(numpy.mean(X, axis=1).reshape(m, 1), 1, n)
+            X2 = X - X.mean(1)[:,numpy.newaxis]
             norms = numpy.sqrt(numpy.sum(X2 * X2, axis=1))
             _cluster_wrap.pdist_cosine_wrap(X2, dm, norms)
         elif mstr in set(['mahalanobis', 'mahal', 'mah']):



Arnar
[email protected]

Original issue reported on code.google.com by [email protected] on 20 Feb 2008 at 3:14

String distances

I am interested in using this package to cluster sequences.  I noticed in
the TODO file, you list that you want to do this as well.  One place to go
is to take the implementation of the Levenshtein edit distance from the
py-editdist package.  In addition, there is a normalized edit distance that
can be easily implemented from that in this paper:
IEEE Trans Pattern Analys Mach Intel 29(6):1091
I'll see about writing it myself, but my C is quite rusty.

Original issue reported on code.google.com by [email protected] on 27 Feb 2009 at 1:21

dendrogram display via matplotlib not working on Mac

What steps will reproduce the problem?
1. compute clusters
2. execute 'dendrogram' command

What is the expected output? What do you see instead?

I expected to see a dendrogram drawn with Matplotlib.  Instead, there is a 
small delay while some 
kind of calculation takes place, but no output window appears.  Matplotlib 
works fine when 
invoked directly in other scripts on my system.

What version of the product are you using? On what operating system?

I've tried this on two Mac systems, a MacBookPro and a Mac Pro (both Intel 
processors).  OS 
version is 10.5.4.  Python installed on both is 2.5.1.  Matplotlib is 0.98pre.  
I've not tried it on 
Linux or Windows, and suspect it may be a platform issue.

Original issue reported on code.google.com by [email protected] on 12 Sep 2008 at 2:31

fclusterdata does not work (i mean completly)

What steps will reproduce the problem?
1. Try to use fclusterdata

the arguments in the function header are different then those in the
function body it simply cannot work 

hcluster-0.1.14, Linux

Cheers once more,
Marcin

Original issue reported on code.google.com by [email protected] on 19 Mar 2008 at 9:05

numpy float32 array are handled incorrectly

What steps will reproduce the problem?
1. Feed a float32 array into pdist

no errors are raised, but pdist returns an array with meaningless results.

What is the expected output? What do you see instead?
scipy-cluster should
 - upgrade float32 to float64 or use float32 nativly

What version of the product are you using? On what operating system?
 hcluster-0.1.4 on linux compiled from source

Cheers,
Marcin

Original issue reported on code.google.com by [email protected] on 19 Mar 2008 at 9:03

Installing on Ubuntu 13.10 pip install hangs

What steps will reproduce the problem?
1.As part of crosscat install, New bayesDB project (although same if done 
independently vis pip install)
setup.py just hangs the pip log shows no errors (python 2.7.5+)
2.
3.

What is the expected output? What do you see instead?
Wish I had more info to provide..


What version of the product are you using? On what operating system?
PIP install of 0.20

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 7 Dec 2013 at 10:45

dendrogram image generation

What steps will reproduce the problem?
1. dendrogram(Z)
2.
3.

What is the expected output? What do you see instead?
I expected to get an image of the dendrogram.
I'm getting just a dictionary containing the colors, coords, etc...
How one can get the image? I guess via matplotlib... I could not do it.

What version of the product are you using? On what operating system?
hcluster 0.1.9
Ubuntu

Original issue reported on code.google.com by [email protected] on 23 Sep 2008 at 6:25

upper triangular and squareform of same distance matrix yield different linkage solutions

After producing an upper triangular distance matrix with pdist, I used
squareform to transform the matrix to square before entering using it as
input to linkage.
So for 
Y = pdist(data)
Y_sq = squareform(Y)
linkage(Y_sq) does NOT equal linkage(Y) -- 

here I expected linkage(Y_sq) == linkage(Y)

I have only read documentation indicating that Y (as upper triangle) is the
standard input to linkage, but using Y_sq yields the result I was expecting
(maybe just a fluke?).  Matlab linkage does not accept Y_sq as input.  What
goes on when I input Y_sq? why is the result different from using Y?




What version of the product are you using? On what operating system?
I am using hcluster 0.2.0 Mac 10.5

Original issue reported on code.google.com by [email protected] on 14 Jul 2009 at 9:19

pre_order for subtrees fails

When trying to list all leafs in a given subtree of the merge tree created by 
to_tree, it fails in some cases with 'Index out of bounds'

What steps will reproduce the problem?
Run the following code to get the exception:

from hcluster import *
dist = [1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
x = linkage(dist)
t = to_tree(x)
t.right.right.pre_order()

What version of the product are you using? On what operating system?

I'm using the port py26-hcluster version 0.2.0 on OS X 10.6 with Python 2.6.

Please provide any additional information below.

Looking at hierarchy.py (revision 132), it looks like the problem is that the 
lists lvisited and rvisited (line 752 and 753) are indexed using node IDs, 
while their size is 2*n (n is the size of the subtree in that node). If the 
tree is large enough, the node ID of the (for example) rightmost leaf is 
actually larger than 2*n for most subtrees containing it.

Original issue reported on code.google.com by [email protected] on 20 Jul 2010 at 9:30

Need python-dev for Ubuntu 10.10

I'm not sure this is the right place for that but I had some issues to compile 
from the source as I needed to install the package python-dev beforehand.
Guess you should add it in the list of required package for Ubuntu (at least 
10.10).
Bests,
Clément

Original issue reported on code.google.com by clement.grimal on 9 Nov 2010 at 3:44

"built-in" functions don't work with pdist()

When you give pdist() a function (instead of a string), it confirms this by 
checking that it's a "types.FunctionType". However, it is useful to be able to 
provide a c-compiled function, which has type "types.BuiltinFunctionType".
Therefore, I suggest that

    if mtype is types.FunctionType:

(on line 1079 in distance.py in revision 132) should be changed to :

    if mtype in (types.FunctionType, types.BuiltinFunctionType):

or even

    if hasattr(metric, '__call__'):

so as to allow any callable.

Original issue reported on code.google.com by [email protected] on 10 Feb 2011 at 4:17

Feature Request: Vector quantisation

Not sure if this is already doable, but some docs on that usage case would be 
great. I.e like the 
scipy-cluster vq function, given a feature vector which cluster does it fall in.

Thank,
Loki

Original issue reported on code.google.com by [email protected] on 21 May 2008 at 7:08

Issues/Fixes for building on Python 2.7 amd64 on Windows

Building on Windows with Visual Studio 2008 for 64-bit I encountered the 
following portability problems with hcluster 0.20's C.

* Use of 'inline' keyword in distance.c/hierarchy.c - Python 2.7 requires 
building with VS2008, which only accepts 'inline' as a keyword if the file is 
named "*.cpp". Changing it to "__inline" in all cases fixed the problem. Not 
sure how to make it portable using distutils, I'd expect that's a common need. 
See http://msdn.microsoft.com/en-us/library/z8y1yy88(v=vs.90).aspx

* Some variable declarations aren't C89-friendly (which s what VS2008 adheres 
to :( ). Moving them to the start of the block in all cases doesn't hurt 
readability much and does make it portable enough:

    ...
    xi = inds[i];
    cnode *xnd = info->nodes + xi;
    xn = xnd->n;
    ...

->

    cnode *xnd;
    ...
    xi = inds[i];
    xnd = info->nodes + xi;
    xn = xnd->n;

After this, I was able to successfully build it.

Original issue reported on code.google.com by [email protected] on 6 Jan 2014 at 9:42

spearman rank coefficient in pdist

The spearman rank coefficient should be added to pdist.

Original issue reported on code.google.com by [email protected] on 12 Apr 2008 at 6:37

fclusterdata segfault

What steps will reproduce the problem?
1. Run fclusterdata() on the points loaded from the csv file:


import numpy as np
import hcluster as hc

data = np.loadtxt('/tmp/segfault.csv')
hc.fclusterdata(data, 1)

I get a segfault after a few seconds. Can anyone reproduce? I get the error on 
two machines I tried so far. Problem is that it needs a good amount of RAM to 
be run.

Machine is Linux #59-Ubuntu SMP x86_64 GNU/Linux.
hcluster 0.2.0, python 2.6.5
numpy 1.6.1

Anyway I could help please let me know.

Original issue reported on code.google.com by [email protected] on 21 Sep 2011 at 1:33

Attachments:

segfault.csv.tar.gz

Incremental clustering

Hi,

 first of all, thanks for the great module! 

 I was wondering if is possible to do incremental HAC with hcluster. Or if
there is any quick workaround.

 Thanks in advance,
 Manos._

Original issue reported on code.google.com by [email protected] on 21 Sep 2009 at 10:00

Precision and Recall

Would we useful to have some metrics in order to evaluate the output

Original issue reported on code.google.com by [email protected] on 11 Sep 2010 at 3:41

deads / scipy-cluster Goto Github PK

scipy-cluster's People

Watchers

scipy-cluster's Issues

Recommend Projects

Recommend Topics

Recommend Org