cwtsleiden / networkanalysis Goto Github PK

Java package that provides data structures and algorithms for network analysis.

License: MIT License

Java 100.00%

network-analysis clustering-algorithm community-detection clustering java layout layout-algorithm leiden-algorithm louvain-algorithm mapping

networkanalysis's People

Contributors

Stargazers

Watchers

networkanalysis's Issues

java.lang.ArrayIndexOutOfBoundsException when using -q Modularity

The following command
java -cp ~/Bioinformatics/networkanalysis/build/libs/networkanalysis-1.1.0-5-ga3f342d.jar nl.cwts.networkanalysis.run.RunNetworkClustering -q CPM -m 1 -w -o /tmp/edges_clustering.txt edges.txt
runs just fine. However, if the -q option is set to "Modularity", I get the following crash:

RunNetworkClustering version 1.1.0
By Vincent Traag, Ludo Waltman, and Nees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
Reading edge list from 'edges.txt'.
Reading edge list took 0s.
Network consists of 18386 nodes and 423926 edges with a total edge weight of 24546.50103795767.
Using singleton initial clustering.
Running Leiden algorithm.
Quality function: Modularity
Resolution parameter: 1.0
Minimum cluster size: 1
Number of random starts: 1
Number of iterations: 10
Randomness parameter: 0.01
Random number generator seed: random
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 5 out of bounds for length 5
at nl.cwts.networkanalysis.LocalMergingAlgorithm.findClustering(LocalMergingAlgorithm.java:190)
at nl.cwts.networkanalysis.LeidenAlgorithm.improveClusteringOneIteration(LeidenAlgorithm.java:228)
at nl.cwts.networkanalysis.LeidenAlgorithm.improveClusteringOneIteration(LeidenAlgorithm.java:276)
at nl.cwts.networkanalysis.IterativeCPMClusteringAlgorithm.improveClustering(IterativeCPMClusteringAlgorithm.java:91)
at nl.cwts.networkanalysis.run.RunNetworkClustering.main(RunNetworkClustering.java:413)

I've attached the input file here
edges.txt
I'm not sure what property of this input file would violate the preconditions of the software. The nodes are 0-indexed numbers, there are no duplicate edges, the lefthand node ID is always less than the righthand node ID, etc.

Question about clustering results

Hi,

Thanks for this fantastic package!
I have an issue related to the output of the clustering algorithm.
I have runned the following code in python:

os.system("java -cp /Applications/networkanalysis/networkanalysis-1.3.0.jar nl.cwts.networkanalysis.run.RunNetworkClustering"
                      " -n AssociationStrength -r 1 -m 20 --sorted-edge-list"
                      " -o " "clusters.txt"
                      " data_net.txt")

The output message announces 829 clusters and 760 clusters after removing clusters consisting of fewer than 20 nodes.
However when I open the file clusters.txt:

the number of clusters is 681 (the clusters number from 0 to 759 with holes).
many clusters contain less than 20 nodes.

Many thanks in advance for your explanation.

setResolution does not set resolution for local moving algorithm

The resolution is not set for the localMovingAlgorithm for the LeidenAlgorithm and the LouvainAlgorithm. We should simply add

    public void setResolution(double resolution)
    {
        super.setResolution(resolution);
        this.localMovingAlgorithm.resolution = resolution;
    }

for both classes.

Warning: NetworkClustering fills the gap nodes from your input network as isolated clusters

Input edgelists ; output cluster information for each nodes.

I have isolates in the network and so the edgelists will not contain any information about them.
However the RunNetworkClustering.jar would assign cluster numbers sequentially for all nodes including those that do not exist in the input edgelists. That is, NetworkClustering will fill the gap nodes as isolated clusters from your input network.
I have attached my edgelist input (node ranges from 0 to 76 without 36) and clusters output here (contain cluster for node 36).

Networkcluster warning example-edgelist.txt
clusters.txt

Self-loops are not properly considered when using modularity

We ignore all self-loops upon reading a particular network. Although this works fine for CPM (as self-loops have no effect there) this is not entirely correct for modularity. In particular, the problem is that the nodeWeights are calculated using getTotalEdgeWeightPerNodeHelper, but this is called only after we have removed the self-loops. In other words, the nodeWeights reflect the degree without the self-loops.

networkanalysis/src/main/java/nl/cwts/networkanalysis/Network.java

Line 1371 in 42fdd9e

    
           this.nodeWeights = (nodeWeights != null) ? nodeWeights.clone() : (setNodeWeightsToTotalEdgeWeights ? getTotalEdgeWeightPerNodeHelper() : nl.cwts.util.Arrays.createDoubleArrayOfOnes(nNodes));

In addition, we do consider self-loops when calculating the "proper" resolution parameter:

networkanalysis/src/main/java/nl/cwts/networkanalysis/run/RunNetworkClustering.java

Line 405 in 42fdd9e

    
           double resolution2 = useModularity ? (resolution / (2 * network.getTotalEdgeWeight() + network.getTotalEdgeWeightSelfLinks())) : resolution;

Both issues should be corrected in order to make modularity work correctly when self-loops are present in a network.

Is there a version 1.1.0 standalone jar package released?

Is there a version 1.1.0 standalone jar package released?I only found the source code and maven dependencies

Documentation Error: Output Formatting

FYI, It seems that the single-column output format specified in your README does not match the actual output format (2 tab-delimited columns: node, cluster).

error message : duplicate values while creating network

Great tool !!!

I get the following error when I try to run the RunNetworkClustering.jar a large network .

** Error while creating network: For each node, corresponding elements of neighbors array must not include duplicate values.**

May I propose to make the error message more explicit ? Does it means that the program found at least one node (source) with same target multiple times or something else ?

Just for clarification : I double checked my input file. It does not have any duplicates (distinct) nor self-loops (source_node<> source_target).

Clustering with normalization methods

Hello Mr. Traag,

Thank you very much for give your implementation of the Leiden algorithm.

I'm using this package to cluster COVID-19 co-occurrence data.

Firstly, I used the jar archive to cluster my documents. I have observed that I obtain the same results than VOSViewer only with non-normalized edges (parameter: -w). I have also developed a unit test for that. However, with normalized data as input, I have the same number of clusters as the number of documents and poor quality. The input data and the VOS viewer output are accessible below.

In this version 1.0, is the jar only clusters data with non-normalized edges?

After that, I have analyzed the code. I have seen that the raw (non-normalized) edges are needed as input. Also methods are not used (eg. createNormalizedNetworkUsingAssociationStrength).

I plan to develop this functionality by following the steps below:

Add a new parameter after -w with one of these values: {No normalization, Association strength, Fractionalization, Lin/log modularity} like VOSViewer
Modify the RunNetworkClustering class and particularly the readEdgeList method
Use the already prepared method (eg. createNormalizedNetworkUsingAssociationStrength) in the Network class

Can you please give me some advice on the implementation I plan to do?

Thank you very much.

Technical details:
RunNetworkClustering.jar version 1.0.0
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
Windows 10 Professional

Link to data:
https://drive.switch.ch/index.php/s/FxICtNgJO8B7s74

eighbors array must not include duplicate values.

Why it report ""Error while creating network: For each node, corresponding elements of neighbors array must not include duplicate values.""?I believe there is no duplicate values in neighbors array

Effect of edge weights in resulting communities

Hi,

Thank you for developing this nice library! I have been trying to understand how weights affect the resulting detected communities and I found something which I did not expect.

Using the example graph you provided in the readme file, I created two versions containing weights on the edges - the edge weights in one version are scaled by a constant relative to the other version:

v1:

v2:

I ran the Leiden algorithm using the following options
java -cp networkanalysis-1.1.0.jar nl.cwts.networkanalysis.run.RunNetworkClustering -r 0.2 -w

For v1, the same communities are detected as for the example in the readme file:

For v2, all nodes are part of one community:

I found this a bit surprising, as I didn't expect the scale of the weights to be so important and I expected to obtain the same results for these two networks.

I can see that in v1.1 of the library, two normalization methods have been introduced for the edge weights, but I couldn't find an explanation for why / when this might be needed and how the different normalization methods might impact the result.

Would it be possible to provide a brief explanation for this?

Thank you.

Best,
Paula Tataru

Seed doesn't affect the results

Tried to use seed option, but results every run are slightly different. Could you please help me?

all_knn <- RcppHNSW::hnsw_knn(expression_scaled, k = k, distance = 'l2',
                               n_threads = n_threads)
ind <- all_knn$idx

# Parallel Jaccard metric
links <- FastPG::rcpp_parallel_jce(ind)

links <- FastPG::dedup_links(links)
links[,1] <- links[,1] - 1
links[,2] <- links[,2] - 1

jar_path <- "networkanalysis-1.1.0.jar"
network_path <- paste0(tempdir(), "/network.txt")
clusters_path <- paste0(tempdir(), "/clusters.txt")
withr::with_options(c(scipen = 10), write.table(links, network_path, row.names = FALSE, col.names = FALSE, sep = "\t"))
system(paste("java  -Xmx30g -cp", jar_path, "nl.cwts.networkanalysis.run.RunNetworkClustering -q Modularity --seed 5024 --weighted-edges -o", clusters_path, network_path))

Can't get results

Hello! I deployed the jar file using the following command: java -cp networkanalysis-1.1.0.jar nl.cwts.networkanalysis.run.RunNetworkClustering -r 1 -o out.txt input.txt

and I get this error: "Error while creating network: Elements of neighbors array must have non-negative values."

Could you please help out? Thank you!

cwtsleiden / networkanalysis Goto Github PK

networkanalysis's People

Contributors

Stargazers

Watchers

Forkers

networkanalysis's Issues

Recommend Projects

Recommend Topics

Recommend Org