guyallard / markov_clustering Goto Github PK

View Code? Open in Web Editor NEW

165.0 165.0 37.0 792 KB

markov clustering in python

License: MIT License

Python 100.00%

clustering markov-clustering networks python

markov_clustering's People

Contributors

Stargazers

Watchers

markov_clustering's Issues

How does MCL treat weight of edges?

Thank you for creating such a wonder tool for our community!

I have a file containing in each line edge information (for example: nodeX, nodeY, weight). The higher the weight, the shorter the distance between nodeX and nodeY.
I converted this file into a matrix using networkx (add_weighted_edges_from and to_scipy_sparse_matrix function). I am wondering how does MCL treat the weights of edges?

Thank you very much.

Regards,
Hai

matplotlib window disappearing

I used the exemple scipt and a data set of mine and in both cases the graph plot popped and terminated itself very quickly. I have no issues with matplotlib anywhere else.

Using python 3.6, all modules are up to date since I reinstalled them with markov_clustering

Some nodes belong to more than one clusters.

After run_mcl and get_clusters, I found that one node belongs to two clusters. Is this method doing a soft clustering?

What does the format of input data look like?

The example given includes 200 nodes and 2D coordinates. I am new to Markov Clustering. I can get sequence-sequence similarity from blast for each gene pair, for example, my current data looks like below:
Gene1 Gene2 60
Gene1 Gene3 70
Gene1 Gene4 65
...
Gene3 Gene4 34
How can I convert this to the format of given example in the manual? Thank you!

Weighted graph support

I'm trying to implement this algorithm implementation on weighted graph.

Does the library support it? If yes, how can I use that feature?

markov_cluster.modularity errors when 1st arg is an np.ndarray

result = mc.run_mcl(score, **kwargs)
clusters = mc.get_clusters(result) 
modularity = mc.modularity(matrix=result, clusters=clusters)

as score is a np.ndarray (np.matrix is depreciated and data is not sparse) result is also a ndarray. This causes mc.modularity to crash on the function call convert_to_adjacency_matrix().

I found a quick fix which was to

modularity = mc.modularity(matrix=csr_matrix(result), clusters=clusters)

however in cases where some of the data doesnt belong to any clusters, this crashes.
The code functions correctly if
modularity = mc.modularity(matrix=np.matrix(result), clusters=clusters)
however np.matrix is depreciated.

the fix is fairly trivial

 if isspmatrix(matrix):
            col = find(matrix[:,i])[2]
        else:
            col = matrix[:,i].T.tolist()[0]

becomes

if isspmatrix(matrix):
    col = find(matrix[:, i])[2]
elif isinstance(matrix, np.ndarray):
    col =  matrix[:, i].T.tolist()
else:
    col = matrix[:, i].T.tolist()[0]

Could raise a PR but its only 2 lines of code, not sure whats easier.

Many thanks

Hyperparameter Tunning

I have 2 questions.

In the example using the modularity only the inflation is varied. When would we vary the expansion. Is there a case where we would want a large expansion and a large inflation (as they act in the opposite fashion).
Is there a way of getting a rough range for this value (given an adjacency matrix). I found in my use case I needed an inflation value of 20 to get the appropriate clustering. I decided to square the adjacency matrix and then i found an inflation of about 5 was good. I am trying to build this code into a pipeline so I wont be able to over see the selection of groups so i worry if my range hyper-parameters is off I may not find optimal values (if say my data changes).

Many thanks

Error Plotting

When I code mc.draw_graph(matrix, clusters, pos=positions, node_size=50, with_labels=False, edge_color="silver"). it gets error : AttributeError: module 'matplotlib.cm' has no attribute 'tab20'.
How do I fix this? thank you

Python 2.*

Currently i have to use Python 2.* . Does this package not work on the older version of python? If not would it be possible to make it work with Python 2.*

Cheers

ValueError: shape mismatch in assignment.

Just tried to run the example code given on github readme.

import markov_clustering as mc

import networkx as nx
import random

# number of nodes to use
numnodes = 200

# generate random positions as a dictionary where the key is the node id and the value
# is a tuple containing 2D coordinates
positions = {i:(random.random() * 2 - 1, random.random() * 2 - 1) for i in range(numnodes)}

# use networkx to generate the graph
network = nx.random_geometric_graph(numnodes, 0.3, pos=positions)

# then get the adjacency matrix (in sparse form)
matrix = nx.to_scipy_sparse_matrix(network)
result = mc.run_mcl(matrix)

Gives the following error:

Traceback (most recent call last):
  File "/home/baqir/code/email-sentiment-analysis/algorithms/markov_clustering_visual.py", line 22, in <module>
    result = mc.run_mcl(matrix)
  File "/home/baqir/code/email-sentiment-analysis/env/lib/python3.6/site-packages/markov_clustering/mcl.py", line 233, in run_mcl
    matrix = prune(matrix, pruning_threshold)
  File "/home/baqir/code/email-sentiment-analysis/env/lib/python3.6/site-packages/markov_clustering/mcl.py", line 93, in prune
    pruned[matrix >= threshold] = matrix[matrix >= threshold]
  File "/home/baqir/code/email-sentiment-analysis/env/lib/python3.6/site-packages/scipy/sparse/_index.py", line 109, in __setitem__
    raise ValueError("shape mismatch in assignment")
ValueError: shape mismatch in assignment

Do not understand if this is a scipy error or markov-clusterting error in passing valid arguments.

module 'markov_clustering' has no attribute 'draw_graph'

Hello,
I am trying to use draw_graph but there is an error that:
module 'markov_clustering' has no attribute 'draw_graph'.
The same is working in google colab but not in jupyter notebook.
Could you please help me with this issue?
Thank you

markov_clustering, possible error in instructions for computing of modularity Q

In calculating modularity Q I get the following error in my script that uses the markov_clustering package:

Traceback (most recent call last):
  File "mcl_3_inflat_test.py", line 48, in <module>
    Q = mc.modularity(matrix=result, clusters=clusters)
  File "/usr/local/lib/python3.7/site-packages/markov_clustering/modularity.py", line 74, in modularity
    matrix = convert_to_adjacency_matrix(matrix)
  File "/usr/local/lib/python3.7/site-packages/markov_clustering/modularity.py", line 39, in convert_to_adjacency_matrix
    coeff = max( Fraction(c).limit_denominator().denominator for c in col )
TypeError: 'float' object is not iterable

Shouldn't

Q = mc.modularity(matrix=result, clusters=clusters)

as it reads in the instructions be instead:

Q = mc.modularity(matrix, clusters=clusters) ?

Or otherwise...?

Thanks a lot.

Shape mismatch in assignment

ValueError Traceback (most recent call last)
in
----> 1 result = mc.run_mcl(matrix) # run MCL with default parameters
2 clusters = mc.get_clusters(result) # get clusters

~\Anaconda3\lib\site-packages\markov_clustering\mcl.py in run_mcl(matrix, expansion, inflation, loop_value, iterations, pruning_threshold, pruning_frequency, convergence_check_frequency, verbose)
231 if pruning_threshold > 0 and i % pruning_frequency == pruning_frequency - 1:
232 printer.print("Pruning")
--> 233 matrix = prune(matrix, pruning_threshold)
234
235 # Check for convergence

~\Anaconda3\lib\site-packages\markov_clustering\mcl.py in prune(matrix, threshold)
91 if isspmatrix(matrix):
92 pruned = dok_matrix(matrix.shape)
---> 93 pruned[matrix >= threshold] = matrix[matrix >= threshold]
94 pruned = pruned.tocsc()
95 else:

~\Anaconda3\lib\site-packages\scipy\sparse_index.py in setitem(self, key, x)
122 x, _ = _broadcast_arrays(x, i)
123 if x.shape != i.shape:
--> 124 raise ValueError("shape mismatch in assignment")
125 if x.size == 0:
126 return

ValueError: shape mismatch in assignment

How to find a new sample belongs to which cluster?

Hi,

I am using it to identify clusters of cytology cells. I have feature set for around 50K cells which I can use to find clusters using the example provided. But what if I have got a number of new cells that I want to check which cluster they belong to. How can I find their cluster?

Ruqayya

Attribute error: Argmax not found

Hi!
I am passing a undirected and weighted networkx graph, g
matrix = nx.to_scipy_sparse_matrix(g)

But when I try to run this code: result = mc.run_mcl(matrix,inflation=1.4)
clusters = mc.get_clusters(result)
mc.draw_graph(matrix, clusters, pos=positions, node_size=50, with_labels=False, edge_color="silver")

It gives me a Attribute error: Argmax not found

DO you know why its happening?

Appreciate the help :)

markov_clustering not works with scipy 1.3.0

I just copied the README example in a python console:

Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import markov_clustering as mc
Matplotlib not present
Visualization not supported to missing libraries.
>>> import networkx as nx
>>> import random
>>> 
>>> # number of nodes to use
... numnodes = 200
>>> 
>>> # generate random positions as a dictionary where the key is the node id and the value
... # is a tuple containing 2D coordinates
... positions = {i:(random.random() * 2 - 1, random.random() * 2 - 1) for i in range(numnodes)}
>>> 
>>> network = nx.random_geometric_graph(numnodes, 0.3, pos=positions)
>>> 
>>> # then get the adjacency matrix (in sparse form)
... matrix = nx.to_scipy_sparse_matrix(network)
>>> result = mc.run_mcl(matrix)           # run MCL with default parameters
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/env/export/cns_n02_scratch/scratch_agc/scratch/agc/users/ggautrea/miniconda3/lib/python3.7/site-packages/markov_clustering-0.0.6.dev0-py3.7.egg/markov_clustering/mcl.py", line 233, in run_mcl
    matrix = prune(matrix, pruning_threshold)
  File "/env/export/cns_n02_scratch/scratch_agc/scratch/agc/users/ggautrea/miniconda3/lib/python3.7/site-packages/markov_clustering-0.0.6.dev0-py3.7.egg/markov_clustering/mcl.py", line 93, in prune
    pruned[matrix >= threshold] = matrix[matrix >= threshold]
  File "/env/export/cns_n02_scratch/scratch_agc/scratch/agc/users/ggautrea/miniconda3/lib/python3.7/site-packages/scipy/sparse/_index.py", line 109, in __setitem__
    raise ValueError("shape mismatch in assignment")
ValueError: shape mismatch in assignment
>>> clusters = mc.get_clusters(result)    # get clusters
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'result' is not defined
>>> import scipy
>>> scipy.__version__
'1.3.0'

But if instead of the scipy 1.3.0 version i use the scipy 1.2.0 one, it works fine.

Draw multiple graphs on different figures without closing the first ones

How can i have multiple windows opened with different figures?

sequence clustering

Super excited that you've put this work into a python MCL approach! Do you know if the input can be sequences? I'm assuming you first need to do a sequence similarity matrix?

Thanks!

Two issues: "nx.to_scipy_sparse_matrix" not exist and the inflation hyperparameter not working?

Hi MCL Community,

I found two issues:

In the sample code, nx.to_scipy_sparse_matrix doesn't exist in networkx 3.0. The document needs to be updated to nx.to_scipy_sparse_array.
I tried the doc: https://markov-clustering.readthedocs.io/en/latest/readme.html and found different hyperparameter inflation =1.5-2.6 also gave the same results and modularity are different from the doc page.

Also inflation=1.1 and inflation=2.5 gave the same clustering outcome. I think there is a bug. Can someone check?

AttributeError: module 'markov_clustering' has no attribute 'draw_graph'

When I run this piece of code:

result = mc.run_mcl(graph_matrix_sparse, inflation=1.4)
clusters = mc.get_clusters(result)
mc.draw_graph(matrix, clusters, pos=positions, node_size=50, with_labels=False, edge_color="silver")

I get message

----> 8 mc.draw_graph(matrix, clusters, pos=positions, node_size=50, with_labels=False, edge_color="silver")

AttributeError: module 'markov_clustering' has no attribute 'draw_graph'

markov-clustering-0.0.6.dev0 ( tried markov-clustering-0.0.5.dev0 too)
python==3.6

no matches found: markov_clustering[drawing]

Hey there!

I'm trying to install this package using pip install markov_clustering[drawing] but I keep getting an error message:

no matches found: markov_clustering[drawing]

The standard version is working fine but I wanted to visualise the clusters/networks. Is there any workaround/suggestion for this?

Overlapping clusters

I'm getting clusters that have overlapping nodes. My assumption is that clusters should have completely distinct sets of nodes. Is my assumption wrong or is this an error?

RuntimeError: nnz of the result is too large

Traceback:

Traceback (most recent call last):
  File "cluster_network.py", line 19, in <module>
    result = mc.run_mcl(matrix, inflation=inflation)
  File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/markov_clustering/mcl.py", line 228, in run_mcl
    matrix = iterate(matrix, expansion, inflation)
  File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/markov_clustering/mcl.py", line 132, in iterate
    matrix = expand(matrix, expansion)
  File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/markov_clustering/mcl.py", line 51, in expand
    return matrix ** power
  File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/scipy/sparse/base.py", line 667, in __pow__
    return tmp * tmp
  File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/scipy/sparse/base.py", line 480, in __mul__
    return self._mul_sparse_matrix(other)
  File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 509, in _mul_sparse_matrix
    np.asarray(other.indices, dtype=idx_dtype))
RuntimeError: nnz of the result is too large

Script:

import markov_clustering as mc
import networkx as nx
import pandas as pd

g = nx.Graph()

chem_interactions = pd.read_csv('chemical_chemical.links.v5.0.tsv', sep="\t")

chem_interactions.columns = ['chem1', 'chem2', 'score']

chem_interactions = chem_interactions.query('score > 0')

for index, row in chem_interactions.iterrows():
    g.add_edge(row['chem1'], row['chem2'], weight=float(row['score']))

matrix = nx.to_scipy_sparse_matrix(g)

for inflation in [i / 10 for i in range(15, 26)]:
    result = mc.run_mcl(matrix, inflation=inflation)
    clusters = mc.get_clusters(result)
    Q = mc.modularity(matrix=result, clusters=clusters)
    print("inflation:", inflation, "modularity:", Q)

Any idea whats going wrong here?

Error when using inflation argument with a value different than pre-defined

Hi,
When I try to set different inflation values I get this error. My adjacency matrix does not contain NA values but it has negative ones.

/Users/gema/opt/anaconda3/lib/python3.8/site-packages/markov_clustering/mcl.py:38: RuntimeWarning: invalid value encountered in power
  return normalize(np.power(matrix, power))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-134-d5960f70e34a> in <module>
      2 # for each clustering run, calculate the modularity
      3 for inflation in [i / 10 for i in range(15, 26)]:
----> 4     result = mc.run_mcl(df, inflation=inflation)
      5     clusters = mc.get_clusters(result)
      6     Q = mc.modularity(matrix=result, clusters=clusters)

~/opt/anaconda3/lib/python3.8/site-packages/markov_clustering/mcl.py in run_mcl(matrix, expansion, inflation, loop_value, iterations, pruning_threshold, pruning_frequency, convergence_check_frequency, verbose)
    226 
    227         # perform MCL expansion and inflation
--> 228         matrix = iterate(matrix, expansion, inflation)
    229 
    230         # prune

~/opt/anaconda3/lib/python3.8/site-packages/markov_clustering/mcl.py in iterate(matrix, expansion, inflation)
    133 
    134     # Inflation
--> 135     matrix = inflate(matrix, inflation)
    136 
    137     return matrix

~/opt/anaconda3/lib/python3.8/site-packages/markov_clustering/mcl.py in inflate(matrix, power)
     36         return normalize(matrix.power(power))
     37 
---> 38     return normalize(np.power(matrix, power))
     39 
     40 

~/opt/anaconda3/lib/python3.8/site-packages/markov_clustering/mcl.py in normalize(matrix)
     21     :returns: The normalized matrix
     22     """
---> 23     return sklearn.preprocessing.normalize(matrix, norm="l1", axis=0)
     24 
     25 

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/preprocessing/_data.py in normalize(X, norm, axis, copy, return_norm)
   1902         raise ValueError("'%d' is not a supported axis" % axis)
   1903 
-> 1904     X = check_array(X, accept_sparse=sparse_format, copy=copy,
   1905                     estimator='the normalize function', dtype=FLOAT_DTYPES)
   1906     if axis == 0:

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    718 
    719         if force_all_finite:
--> 720             _assert_all_finite(array,
    721                                allow_nan=force_all_finite == 'allow-nan')
    722 

~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
    101                 not allow_nan and not np.isfinite(X).all()):
    102             type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103             raise ValueError(
    104                     msg_err.format
    105                     (type_err,

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Draw edge Labels

I draw a graph with some weights between some edges.

Could i show these weights into the plot during the draw of my graph?

How should I cite this package?

Hi. This package helped me a lot with my research. Do you have any suggestion on how I could cite this package in my paper?

AttributeError: module 'markov_clustering' has no attribute 'run_mcl'

Hi,
I installed your package using pip as stated on the readme, but can't get the example script running.
I guess it can't find the run_mcl method/function.

The error is:
AttributeError: module 'markov_clustering' has no attribute 'run_mcl'

Is there a problem in the released version of the package? or if no, can you suggest how I can fix this?
I'm using Python 3.5

Thanks

Network Graph from SciPy Hierarchical Clustering

Hi,

Using SciPy Hierarchical Clustering how can we create the network graph?

from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np

generate two clusters: a with 100 points, b with 50:

np.random.seed(4711) # for repeatability of this tutorial
a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[100,])
b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[50,])
X = np.concatenate((a, b),)
print X.shape # 150 samples with 2 dimensions
plt.scatter(X[:,0], X[:,1])
plt.show()

generate the linkage matrix

Z = linkage(X, 'ward')

from scipy.cluster.hierarchy import cophenet
from scipy.spatial.distance import pdist

c, coph_dists = cophenet(Z, pdist(X))
c

calculate full dendrogram

plt.figure(figsize=(25, 10))
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('sample index')
plt.ylabel('distance')
dendrogram(
Z,
leaf_rotation=90., # rotates the x axis labels
leaf_font_size=8., # font size for the x axis labels
)
plt.show()

Now how do we plot the network graph like yours instead of dendrogram?
Kindly assist. Thank you!

conda-forge support

I was wondering if we could make this package available via conda? The we can only install onto our servers via conda so I am unable to use this as it stands.