guyallard / markov_clustering Goto Github PK
View Code? Open in Web Editor NEWmarkov clustering in python
License: MIT License
markov clustering in python
License: MIT License
Thank you for creating such a wonder tool for our community!
I have a file containing in each line edge information (for example: nodeX, nodeY, weight). The higher the weight, the shorter the distance between nodeX and nodeY.
I converted this file into a matrix using networkx (add_weighted_edges_from and to_scipy_sparse_matrix function). I am wondering how does MCL treat the weights of edges?
Thank you very much.
Regards,
Hai
I used the exemple scipt and a data set of mine and in both cases the graph plot popped and terminated itself very quickly. I have no issues with matplotlib anywhere else.
Using python 3.6, all modules are up to date since I reinstalled them with markov_clustering
After run_mcl and get_clusters, I found that one node belongs to two clusters. Is this method doing a soft clustering?
The example given includes 200 nodes and 2D coordinates. I am new to Markov Clustering. I can get sequence-sequence similarity from blast for each gene pair, for example, my current data looks like below:
Gene1 Gene2 60
Gene1 Gene3 70
Gene1 Gene4 65
...
Gene3 Gene4 34
How can I convert this to the format of given example in the manual? Thank you!
I'm trying to implement this algorithm implementation on weighted graph.
Does the library support it? If yes, how can I use that feature?
result = mc.run_mcl(score, **kwargs)
clusters = mc.get_clusters(result)
modularity = mc.modularity(matrix=result, clusters=clusters)
as score is a np.ndarray (np.matrix is depreciated and data is not sparse) result is also a ndarray. This causes mc.modularity to crash on the function call convert_to_adjacency_matrix().
I found a quick fix which was to
modularity = mc.modularity(matrix=csr_matrix(result), clusters=clusters)
however in cases where some of the data doesnt belong to any clusters, this crashes.
The code functions correctly if
modularity = mc.modularity(matrix=np.matrix(result), clusters=clusters)
however np.matrix is depreciated.
the fix is fairly trivial
if isspmatrix(matrix):
col = find(matrix[:,i])[2]
else:
col = matrix[:,i].T.tolist()[0]
becomes
if isspmatrix(matrix):
col = find(matrix[:, i])[2]
elif isinstance(matrix, np.ndarray):
col = matrix[:, i].T.tolist()
else:
col = matrix[:, i].T.tolist()[0]
Could raise a PR but its only 2 lines of code, not sure whats easier.
Many thanks
Hi
I have 2 questions.
In the example using the modularity only the inflation is varied. When would we vary the expansion. Is there a case where we would want a large expansion and a large inflation (as they act in the opposite fashion).
Is there a way of getting a rough range for this value (given an adjacency matrix). I found in my use case I needed an inflation value of 20 to get the appropriate clustering. I decided to square the adjacency matrix and then i found an inflation of about 5 was good. I am trying to build this code into a pipeline so I wont be able to over see the selection of groups so i worry if my range hyper-parameters is off I may not find optimal values (if say my data changes).
Many thanks
When I code mc.draw_graph(matrix, clusters, pos=positions, node_size=50, with_labels=False, edge_color="silver"). it gets error : AttributeError: module 'matplotlib.cm' has no attribute 'tab20'.
How do I fix this? thank you
Hi
Currently i have to use Python 2.* . Does this package not work on the older version of python? If not would it be possible to make it work with Python 2.*
Cheers
Just tried to run the example code given on github readme.
import markov_clustering as mc
import networkx as nx
import random
# number of nodes to use
numnodes = 200
# generate random positions as a dictionary where the key is the node id and the value
# is a tuple containing 2D coordinates
positions = {i:(random.random() * 2 - 1, random.random() * 2 - 1) for i in range(numnodes)}
# use networkx to generate the graph
network = nx.random_geometric_graph(numnodes, 0.3, pos=positions)
# then get the adjacency matrix (in sparse form)
matrix = nx.to_scipy_sparse_matrix(network)
result = mc.run_mcl(matrix)
Gives the following error:
Traceback (most recent call last):
File "/home/baqir/code/email-sentiment-analysis/algorithms/markov_clustering_visual.py", line 22, in <module>
result = mc.run_mcl(matrix)
File "/home/baqir/code/email-sentiment-analysis/env/lib/python3.6/site-packages/markov_clustering/mcl.py", line 233, in run_mcl
matrix = prune(matrix, pruning_threshold)
File "/home/baqir/code/email-sentiment-analysis/env/lib/python3.6/site-packages/markov_clustering/mcl.py", line 93, in prune
pruned[matrix >= threshold] = matrix[matrix >= threshold]
File "/home/baqir/code/email-sentiment-analysis/env/lib/python3.6/site-packages/scipy/sparse/_index.py", line 109, in __setitem__
raise ValueError("shape mismatch in assignment")
ValueError: shape mismatch in assignment
Do not understand if this is a scipy error or markov-clusterting error in passing valid arguments.
Hello,
I am trying to use draw_graph but there is an error that:
module 'markov_clustering' has no attribute 'draw_graph'.
The same is working in google colab but not in jupyter notebook.
Could you please help me with this issue?
Thank you
In calculating modularity Q I get the following error in my script that uses the markov_clustering package:
Traceback (most recent call last):
File "mcl_3_inflat_test.py", line 48, in <module>
Q = mc.modularity(matrix=result, clusters=clusters)
File "/usr/local/lib/python3.7/site-packages/markov_clustering/modularity.py", line 74, in modularity
matrix = convert_to_adjacency_matrix(matrix)
File "/usr/local/lib/python3.7/site-packages/markov_clustering/modularity.py", line 39, in convert_to_adjacency_matrix
coeff = max( Fraction(c).limit_denominator().denominator for c in col )
TypeError: 'float' object is not iterable
Shouldn't
Q = mc.modularity(matrix=result, clusters=clusters)
as it reads in the instructions be instead:
Q = mc.modularity(matrix, clusters=clusters)
?
Or otherwise...?
Thanks a lot.
ValueError Traceback (most recent call last)
in
----> 1 result = mc.run_mcl(matrix) # run MCL with default parameters
2 clusters = mc.get_clusters(result) # get clusters
~\Anaconda3\lib\site-packages\markov_clustering\mcl.py in run_mcl(matrix, expansion, inflation, loop_value, iterations, pruning_threshold, pruning_frequency, convergence_check_frequency, verbose)
231 if pruning_threshold > 0 and i % pruning_frequency == pruning_frequency - 1:
232 printer.print("Pruning")
--> 233 matrix = prune(matrix, pruning_threshold)
234
235 # Check for convergence
~\Anaconda3\lib\site-packages\markov_clustering\mcl.py in prune(matrix, threshold)
91 if isspmatrix(matrix):
92 pruned = dok_matrix(matrix.shape)
---> 93 pruned[matrix >= threshold] = matrix[matrix >= threshold]
94 pruned = pruned.tocsc()
95 else:
~\Anaconda3\lib\site-packages\scipy\sparse_index.py in setitem(self, key, x)
122 x, _ = _broadcast_arrays(x, i)
123 if x.shape != i.shape:
--> 124 raise ValueError("shape mismatch in assignment")
125 if x.size == 0:
126 return
ValueError: shape mismatch in assignment
Hi,
I am using it to identify clusters of cytology cells. I have feature set for around 50K cells which I can use to find clusters using the example provided. But what if I have got a number of new cells that I want to check which cluster they belong to. How can I find their cluster?
Ruqayya
Hi!
I am passing a undirected and weighted networkx graph, g
matrix = nx.to_scipy_sparse_matrix(g)
But when I try to run this code: result = mc.run_mcl(matrix,inflation=1.4)
clusters = mc.get_clusters(result)
mc.draw_graph(matrix, clusters, pos=positions, node_size=50, with_labels=False, edge_color="silver")
It gives me a Attribute error: Argmax not found
DO you know why its happening?
Appreciate the help :)
I just copied the README example in a python console:
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import markov_clustering as mc
Matplotlib not present
Visualization not supported to missing libraries.
>>> import networkx as nx
>>> import random
>>>
>>> # number of nodes to use
... numnodes = 200
>>>
>>> # generate random positions as a dictionary where the key is the node id and the value
... # is a tuple containing 2D coordinates
... positions = {i:(random.random() * 2 - 1, random.random() * 2 - 1) for i in range(numnodes)}
>>>
>>> network = nx.random_geometric_graph(numnodes, 0.3, pos=positions)
>>>
>>> # then get the adjacency matrix (in sparse form)
... matrix = nx.to_scipy_sparse_matrix(network)
>>> result = mc.run_mcl(matrix) # run MCL with default parameters
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/env/export/cns_n02_scratch/scratch_agc/scratch/agc/users/ggautrea/miniconda3/lib/python3.7/site-packages/markov_clustering-0.0.6.dev0-py3.7.egg/markov_clustering/mcl.py", line 233, in run_mcl
matrix = prune(matrix, pruning_threshold)
File "/env/export/cns_n02_scratch/scratch_agc/scratch/agc/users/ggautrea/miniconda3/lib/python3.7/site-packages/markov_clustering-0.0.6.dev0-py3.7.egg/markov_clustering/mcl.py", line 93, in prune
pruned[matrix >= threshold] = matrix[matrix >= threshold]
File "/env/export/cns_n02_scratch/scratch_agc/scratch/agc/users/ggautrea/miniconda3/lib/python3.7/site-packages/scipy/sparse/_index.py", line 109, in __setitem__
raise ValueError("shape mismatch in assignment")
ValueError: shape mismatch in assignment
>>> clusters = mc.get_clusters(result) # get clusters
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'result' is not defined
>>> import scipy
>>> scipy.__version__
'1.3.0'
But if instead of the scipy 1.3.0 version i use the scipy 1.2.0 one, it works fine.
How can i have multiple windows opened with different figures?
Super excited that you've put this work into a python MCL approach! Do you know if the input can be sequences? I'm assuming you first need to do a sequence similarity matrix?
Thanks!
Hi MCL Community,
I found two issues:
nx.to_scipy_sparse_matrix
doesn't exist in networkx 3.0. The document needs to be updated to nx.to_scipy_sparse_array
.Also inflation=1.1 and inflation=2.5 gave the same clustering outcome. I think there is a bug. Can someone check?
When I run this piece of code:
result = mc.run_mcl(graph_matrix_sparse, inflation=1.4)
clusters = mc.get_clusters(result)
mc.draw_graph(matrix, clusters, pos=positions, node_size=50, with_labels=False, edge_color="silver")
I get message
----> 8 mc.draw_graph(matrix, clusters, pos=positions, node_size=50, with_labels=False, edge_color="silver")
AttributeError: module 'markov_clustering' has no attribute 'draw_graph'
markov-clustering-0.0.6.dev0 ( tried markov-clustering-0.0.5.dev0 too)
python==3.6
Hey there!
I'm trying to install this package using pip install markov_clustering[drawing]
but I keep getting an error message:
no matches found: markov_clustering[drawing]
The standard version is working fine but I wanted to visualise the clusters/networks. Is there any workaround/suggestion for this?
I'm getting clusters that have overlapping nodes. My assumption is that clusters should have completely distinct sets of nodes. Is my assumption wrong or is this an error?
Traceback:
Traceback (most recent call last):
File "cluster_network.py", line 19, in <module>
result = mc.run_mcl(matrix, inflation=inflation)
File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/markov_clustering/mcl.py", line 228, in run_mcl
matrix = iterate(matrix, expansion, inflation)
File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/markov_clustering/mcl.py", line 132, in iterate
matrix = expand(matrix, expansion)
File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/markov_clustering/mcl.py", line 51, in expand
return matrix ** power
File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/scipy/sparse/base.py", line 667, in __pow__
return tmp * tmp
File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/scipy/sparse/base.py", line 480, in __mul__
return self._mul_sparse_matrix(other)
File "/home/itan_lab/david/.conda/envs/goflof/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 509, in _mul_sparse_matrix
np.asarray(other.indices, dtype=idx_dtype))
RuntimeError: nnz of the result is too large
Script:
import markov_clustering as mc
import networkx as nx
import pandas as pd
g = nx.Graph()
chem_interactions = pd.read_csv('chemical_chemical.links.v5.0.tsv', sep="\t")
chem_interactions.columns = ['chem1', 'chem2', 'score']
chem_interactions = chem_interactions.query('score > 0')
for index, row in chem_interactions.iterrows():
g.add_edge(row['chem1'], row['chem2'], weight=float(row['score']))
matrix = nx.to_scipy_sparse_matrix(g)
for inflation in [i / 10 for i in range(15, 26)]:
result = mc.run_mcl(matrix, inflation=inflation)
clusters = mc.get_clusters(result)
Q = mc.modularity(matrix=result, clusters=clusters)
print("inflation:", inflation, "modularity:", Q)
Any idea whats going wrong here?
Hi,
When I try to set different inflation values I get this error. My adjacency matrix does not contain NA values but it has negative ones.
/Users/gema/opt/anaconda3/lib/python3.8/site-packages/markov_clustering/mcl.py:38: RuntimeWarning: invalid value encountered in power
return normalize(np.power(matrix, power))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-134-d5960f70e34a> in <module>
2 # for each clustering run, calculate the modularity
3 for inflation in [i / 10 for i in range(15, 26)]:
----> 4 result = mc.run_mcl(df, inflation=inflation)
5 clusters = mc.get_clusters(result)
6 Q = mc.modularity(matrix=result, clusters=clusters)
~/opt/anaconda3/lib/python3.8/site-packages/markov_clustering/mcl.py in run_mcl(matrix, expansion, inflation, loop_value, iterations, pruning_threshold, pruning_frequency, convergence_check_frequency, verbose)
226
227 # perform MCL expansion and inflation
--> 228 matrix = iterate(matrix, expansion, inflation)
229
230 # prune
~/opt/anaconda3/lib/python3.8/site-packages/markov_clustering/mcl.py in iterate(matrix, expansion, inflation)
133
134 # Inflation
--> 135 matrix = inflate(matrix, inflation)
136
137 return matrix
~/opt/anaconda3/lib/python3.8/site-packages/markov_clustering/mcl.py in inflate(matrix, power)
36 return normalize(matrix.power(power))
37
---> 38 return normalize(np.power(matrix, power))
39
40
~/opt/anaconda3/lib/python3.8/site-packages/markov_clustering/mcl.py in normalize(matrix)
21 :returns: The normalized matrix
22 """
---> 23 return sklearn.preprocessing.normalize(matrix, norm="l1", axis=0)
24
25
~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~/opt/anaconda3/lib/python3.8/site-packages/sklearn/preprocessing/_data.py in normalize(X, norm, axis, copy, return_norm)
1902 raise ValueError("'%d' is not a supported axis" % axis)
1903
-> 1904 X = check_array(X, accept_sparse=sparse_format, copy=copy,
1905 estimator='the normalize function', dtype=FLOAT_DTYPES)
1906 if axis == 0:
~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
718
719 if force_all_finite:
--> 720 _assert_all_finite(array,
721 allow_nan=force_all_finite == 'allow-nan')
722
~/opt/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
101 not allow_nan and not np.isfinite(X).all()):
102 type_err = 'infinity' if allow_nan else 'NaN, infinity'
--> 103 raise ValueError(
104 msg_err.format
105 (type_err,
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I draw a graph with some weights between some edges.
Could i show these weights into the plot during the draw of my graph?
Hi. This package helped me a lot with my research. Do you have any suggestion on how I could cite this package in my paper?
Hi,
I installed your package using pip as stated on the readme, but can't get the example script running.
I guess it can't find the run_mcl method/function.
The error is:
AttributeError: module 'markov_clustering' has no attribute 'run_mcl'
Is there a problem in the released version of the package? or if no, can you suggest how I can fix this?
I'm using Python 3.5
Thanks
Hi,
Using SciPy Hierarchical Clustering how can we create the network graph?
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
np.random.seed(4711) # for repeatability of this tutorial
a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[100,])
b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[50,])
X = np.concatenate((a, b),)
print X.shape # 150 samples with 2 dimensions
plt.scatter(X[:,0], X[:,1])
plt.show()
Z = linkage(X, 'ward')
from scipy.cluster.hierarchy import cophenet
from scipy.spatial.distance import pdist
c, coph_dists = cophenet(Z, pdist(X))
c
plt.figure(figsize=(25, 10))
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('sample index')
plt.ylabel('distance')
dendrogram(
Z,
leaf_rotation=90., # rotates the x axis labels
leaf_font_size=8., # font size for the x axis labels
)
plt.show()
Now how do we plot the network graph like yours instead of dendrogram?
Kindly assist. Thank you!
Hi
I was wondering if we could make this package available via conda? The we can only install onto our servers via conda so I am unable to use this as it stands.
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.