Code Monkey home page Code Monkey logo

easy-graph / easy-graph Goto Github PK

View Code? Open in Web Editor NEW
351.0 9.0 30.0 4.25 MB

EasyGraph is an open source network analysis library, which covers advanced network processing methods in structural hole spanners detection, network embedding and several classic methods.

Home Page: https://easy-graph.github.io/

License: BSD 3-Clause "New" or "Revised" License

Python 91.59% C++ 7.95% C 0.26% Shell 0.16% Batchfile 0.03% Makefile 0.01%
network-analysis multiprocessing-optimization structural-hole-theory python

easy-graph's Introduction

EasyGraph

Copyright (C) <2020-2024> by DataNET Group, Fudan University


PyPI Version Python License Downloads

Introduction

EasyGraph is an open-source network analysis library. It is mainly written in Python and supports analysis for undirected networks and directed networks. EasyGraph supports various formats of network data and covers a series of important network analysis algorithms for community detection, structural hole spanner detection, network embedding, and motif detection. Moreover, EasyGraph implements some key elements using C++ and introduces multiprocessing optimization to achieve better efficiency.

New Features in Version 1.1

  • Support for more hypergraph metrics and algorithms. Such as hypercoreness, vector-centrality, s-centrality, and so on.
  • Support for more hypergraph datasets. Static hypergraph datasets and dynamic datasets can be both loaded by calling corresponding dataset name.
  • Support for more flexible dynamic hypergraph visualization. Users can define dynamic hypergraphs and visualize the structure of the hypergraph at each timestamp.
  • Support for more efficient hypergraph computation and hypergraph learning. Adoption of suitable storage structure and caching strategy for different metrics/hypergraph neural networks.

If you need more details, please see our documentation of the latest version.

News

  • [04-09-2024] We released EasyGraph 1.2! This version now fully supports Python 3.12.
  • [03-06-2024] We received the Shanghai Open Source Innovation Outstanding Achievement Award (Grand Prize)! News
  • [02-05-2024] We released EasyGraph 1.1! This version features hypergraph analysis and learning for higher-order network modeling and representation.
  • [08-17-2023] We released EasyGraph 1.0!
  • [08-08-2023] Our paper "EasyGraph: A Multifunctional, Cross-Platform, and Effective Library for Interdisciplinary Network Analysis" was accepted by Patterns (Cell Press)!

Stargazers

Stars

Install

  • Prerequisites

3.8 <= Python <= 3.12 is required.

  • Installation with pip
    $ pip install --upgrade Python-EasyGraph

The conda package is no longer updated or maintained.

If you've installed EasyGraph this way before, please uninstall it with conda and install it with pip.

If prebuilt EasyGraph wheels are not supported for your platform (OS / CPU arch, check here), you can build it locally this way:

    git clone https://github.com/easy-graph/Easy-Graph && cd Easy-Graph && git checkout pybind11
    pip install pybind11
    python3 setup.py build_ext
    python3 setup.py install
  • Hint

    EasyGraph uses 1.12.1 <= PyTorch < 2.0 for machine learning functions. Note that this does not prevent your from running non-machine learning functions normally, if there is no PyTorch in your environment. But you will receive some warnings which remind you some unavailable modules when they depend on it.

Simple Example

This example shows the general usage of methods in EasyGraph.

  >>> import easygraph as eg
  >>> G = eg.Graph()
  >>> G.add_edges([(1,2), (2,3), (1,3), (3,4), (4,5), (3,5), (5,6)])
  >>> eg.pagerank(G)
  {1: 0.14272233049003707, 2: 0.14272233049003694, 3: 0.2685427766200994, 4: 0.14336430577918527, 5: 0.21634929087322705, 6: 0.0862989657474143}

This is a simple example for the detection of structural hole spanners using the HIS algorithm.

  >>> import easygraph as eg
  >>> G = eg.Graph()
  >>> G.add_edges([(1,2), (2,3), (1,3), (3,4), (4,5), (3,5), (5,6)])
  >>> _, _, H = eg.get_structural_holes_HIS(G, C=[frozenset([1,2,3]), frozenset([4,5,6])])
  >>> H # The structural hole score of each node. Note that node `4` is regarded as the most possible structural hole spanner.
  {1: {0: 0.703948974609375},
   2: {0: 0.703948974609375},
   3: {0: 1.2799804687499998},
   4: {0: 1.519976806640625},
   5: {0: 1.519976806640625},
   6: {0: 0.83595703125}
  }

Citation

If you use EasyGraph in a scientific publication, we would appreciate citations to the following paper:

  @article{gao2023easygraph,
      title={{EasyGraph: A Multifunctional, Cross-Platform, and Effective Library for Interdisciplinary Network Analysis}},
      author={Min Gao and Zheng Li and Ruichen Li and Chenhao Cui and Xinyuan Chen and Bodian Ye and Yupeng Li and Weiwei Gu and Qingyuan Gong and Xin Wang and Yang Chen},
      year={2023},
      journal={Patterns},
      volume={4},
      number={10},
      pages={100839},
  }

easy-graph's People

Contributors

chenyang03 avatar coreturn avatar gudauu avatar icypole avatar lqoyvle avatar luowangzi7 avatar mgao97 avatar peppasaur avatar sds131 avatar tddschn avatar unparalleled-calvin avatar wenwen0702 avatar yizhihenpidehou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

easy-graph's Issues

Add a CONTRIBUTING file

By GitHub convention, the CONTRIBUTING file specifies the guidelines for contribution, which allows more potential collabrators to contribute to EasyGraph.

Example outline of the CONTRIBUTING file:

  • Coding style (e.g. PEP8) and the formatter to use (yapf, black, autopep8)
  • Issue labeling rules
  • Ways to contribute

Also consider adding a Code of Conduct file if this repo becomes popular. :)

Supports for graph metric computations on node subset not all nodes in a graph

I think this is a common problem for several graph metrics and could be easily supported.

Take node degree as an example. Suppose we have a graph with 100k nodes and 500k edges. I just want to know the degree values of 100 nodes within that graph.

One direct way is to just compute the degree values of these 100 nodes. Existing implementations seem to require computing all the nodes and then selecting the values corresponding to the desired nodes.

Add the time complexities

  • I have searched the issues of this repo and believe that this is not a duplicate.

Issue

I am wondering if you can add the calculation complexity (something like O(N^2)) in the documentation.
It would provide programmers with more information about how to pick an appropriate function.

Support for creating graph object with node attrs. and edge attrs.

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.

Feature Request

  1. Hi all, I am a little curious about the support for creating graph objects with node attributes and edge attributes.
  2. Does EasyGraph support other graph types such as dynamic graphs and some high-order networks, such as hypergraphs?

Thanks a lot!

Using constraint and hierarchy on GraphC: AttributeError: 'GraphC' object has no attribute 'is_multigraph'

Error was encountered when running eg.constraint and eg.hierarchy on an GraphC object representing the karate club dataset.

Info

OS: macOS Monterey 12.4, Intel

python: CPython 3.9.13

Python 3.9.13 (main, May 24 2022, 21:28:31) 
[Clang 13.1.6 (clang-1316.0.21.2)] on darwin

python-easygraph: 0.2a38

To reproduce the error, run

## Setup
git clone https://github.com/tddschn/easygraph-test.git
cd easygraph-test

# install pinned dependencies
# CPython >=3.9,<3.10 is required
poetry install # install poetry first: https://python-poetry.org/
# activate venv
poetry shell

## Run tests

poetry run pytest

Error when running the above tests

AttributeError: 'GraphC' object has no attribute 'is_multigraph'

Set up CI to test across different python versions and operating systems

To ensure the changes made don't break existing features, it's a good practice to set up CI run on that is triggered every pull requrests and commit to the master branch, for automated testing across all supported python versions and operating systems. Commits to master will be blocked until all the tests have passed.

See this guide for Building and testing Python on GitHub Actions.

Questions:

  • How much test coverage are we looking at here?
  • What features requires more attention and tests?
  • Testing framework to use? (pytest, unittest)
  • How do we test and keep track of the performance of the functions? Do we compare the perfs with that of popular libraries?
  • How do we test ML related features like graph embedding in a deterministic way so that the test case runs produce the same results?

Conda installation doesn't work on macOS with python3.9.10

OS: macOS Monterey 12.3
CPU: Intel

$ which python3
/usr/local/Caskroom/miniconda/base/envs/lab/bin/python3

$ python3 --version
Python 3.9.10

$ conda install -c fudanmsn Python-EasyGraph
...
# installed easygraph from
# + python-easygraph              0.2a35  py39_0                  fudanmsn/osx-64           3MB
...

$ python3 -c 'import easygraph'
^CTraceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/Caskroom/miniconda/base/envs/lab/lib/python3.9/site-packages/easygraph/__init__.py", line 8, in <module>
    import easygraph.functions
  File "/usr/local/Caskroom/miniconda/base/envs/lab/lib/python3.9/site-packages/easygraph/functions/__init__.py", line 5, in <module>
    from easygraph.functions.drawing import *
  File "/usr/local/Caskroom/miniconda/base/envs/lab/lib/python3.9/site-packages/easygraph/functions/drawing/__init__.py", line 3, in <module>
    from .plot import *
  File "/usr/local/Caskroom/miniconda/base/envs/lab/lib/python3.9/site-packages/easygraph/functions/drawing/plot.py", line 4, in <module>
    import statsmodels.api as sm
  File "/usr/local/Caskroom/miniconda/base/envs/lab/lib/python3.9/site-packages/statsmodels/api.py", line 105, in <module>
    from .graphics import api as graphics
  File "/usr/local/Caskroom/miniconda/base/envs/lab/lib/python3.9/site-packages/statsmodels/graphics/api.py", line 1, in <module>
    from . import tsaplots as tsa
  File "/usr/local/Caskroom/miniconda/base/envs/lab/lib/python3.9/site-packages/statsmodels/graphics/tsaplots.py", line 11, in <module>
    from statsmodels.tsa.stattools import acf, pacf
  File "/usr/local/Caskroom/miniconda/base/envs/lab/lib/python3.9/site-packages/statsmodels/tsa/stattools.py", line 19, in <module>
    from scipy.signal import correlate
  File "/usr/local/Caskroom/miniconda/base/envs/lab/lib/python3.9/site-packages/scipy/signal/__init__.py", line 329, in <module>
    from ._spectral_py import *
  File "/usr/local/Caskroom/miniconda/base/envs/lab/lib/python3.9/site-packages/scipy/signal/_spectral_py.py", line 8, in <module>
    from ._spectral import _lombscargle
KeyboardInterrupt

Add a test entry point all tests

What is a test entry point?
You've got tons of test cases, and you want to run a simple pytest all_tests.py to test them all.
Here the all_tests.py is the test entry point of all tests.
In the all_tests.py, you may include all the tests for classes, readwritte etc.

Rationale:

  • Ergonomic for local dev and testing
  • Portable to CI

Skipgram parameter needs to be renamed

Having installed Easy-Graph with the dependency Gensim>=4.1.2, the execution failed outputting that the Gensim Word2Vec class could not recognize the skipgram argument "size". Digging around, I found that the learn_embeddings() function (link below) inserts the keyword 'size' in skip_gram_params, although the Gensim Word2Vec class requires a 'vector_size' argument. The parameter must be renamed to solve the issue.

skip_gram_params['size'] = dimensions

Add PyPI API key in repo settings for CI

Who should do it?

The person controlling EasyGraph package on PyPI and the creator of GitHub org easy-graph, who's probably @ICYPOLE.

Steps to take:

Go to https://pypi.org/manage/account/#api-tokens and create a new API token. If you have the project on PyPI already, limit the token scope to just that project. You can call it something like GitHub Actions CI/CD — project-org/project-repo in order for it to be easily distinguishable in the token list. Don’t close the page just yet — you won’t see that token again.

In a separate browser tab or window, go to the Settings tab of your target repository and then click on Secrets in the left sidebar.

Create a new secret called PYPI_API_TOKEN and copy-paste the token from the first step.

Slow performance on Structural Holes

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.

Version

0.2a39, Installed through pypi and Linux Script in repo. Tested using Ubuntu 20.04 TLS

Issue

For HITS AND MAX-D structural holes, it took forever to return results of a large graph with 80K relationships.
Is there any way to speed up the performance? This is because only single core is used as processing at a time.

Source data: https://github.com/aritan7/pgraph/raw/main/node10k.csv

import easygraph as eg
from easygraph.functions.community.modularity_max_detection import *
G = eg.Graph()
G.add_edges_from_file(file='node10k.csv')
c = eg.greedy_modularity_communities(G)
maxd = eg.get_structural_holes_MaxD(G,k=50, C=c)

Update:

  1. Using get_structural_holes_HIS() would cause a memory leak.

error about SDNE

when I try to run:
model1 = eg.functions.graph_embedding.sdne.SDNE(G,hidden_size=[256, 128]) # The order of model LINE. 'first','second' or 'all'.
model1.train(batch_size=3000, epochs=40, verbose=2)
y = model1.get_embeddings() # Returns the graph embedding results.

I get the error like:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported numpy type: NPY_INT).

I'm wondering what's wrong?

Proposal: Adopt a Git Workflow

I think it'd be beneficial to adopt a git workflow,
e.g. constantly pushing to dev branch and only periodically merging dev into master.

master is always stable and won't break for the end users.

Commits on master can also be set up to trigger an automatic release to PyPI & conda, via CI pipelines.

Great job! Some personal suggestions.

This work is solid and wonderful! I like it.
Hope there are some functions designed for tree. For example, given a tree, extract some sub-trees according their name or id, or flatten the tree as a linear vector to compress it. Besides, is it possible to design an efficient recursive tree for BPE (Byte Pair Encoding)? This can greatly improves the processing efficiency of tree-based or NLP-based tasks.
The above is just my personal suggestions for further improvement.
Wish this project get better and better!

Structural Hole Calculation for Directed Graph

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.

Feature Request

The calculation of Effective Size(a feature of structural hole theory) only supports the undirected graphs. However, I think this metric is also useful for directed graphs. The "networkx" package has supported this by "networkx.effective_size".
I propose this issue for requesting the calculation support of structural hole theory metrics for directed graphs. Thanks!

Support functions with parellel running and better use more cores of CPU

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.

Issue

I just used the function "eg.functions.not_sorted.pagerank" to calculate the pageranks of a graph, with about 50,000 nodes.
It runs a long time, and I check the system status using the command "htop". It shows that the calculation only uses one core of CPU, and wastes all other CPU cores. So I propose the issue, hope for parallel running characteristics and better usage of more CPU cores.

Add examples for structural hole & graph properties

Better to have two examples. One could be related to the structural hole theory, and the other one could be related to other graph properties. I feel that some users have no idea about the structural hole theory, and thus better to have another example for them.

Incorrect Bridge Computation

I believe that the computation of bridges (for undirected graphs) is incorrect. I have the following example, where easygraph fails to correctly compute the bridges:

Consider as undirected graph G a clique of size 4, i.e. with 4 nodes [0, 1, 2, 3] and edges [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)].
Then, we remove edges [(0, 1), (0, 2)] by repeatedly calling G.remove_edge(...).
After removing these edges, node 0 will have a single edge (0, 3) connecting it to the remainder of the graph. Hence, edge (0, 3) must be a bridge.
However, computing the bridges with easygraph.bridges(G) returns an emtpy set. This is an error.

Here is a minimal test case:

import easygraph as eg
G = eg.Graph()
G.add_edges([ (0, 3), (1, 2), (1, 3), (2, 3), ])
bridges = eg.bridges(G)
assert (0, 3) in bridges

error about node2vec

when I run:
y,x44 = eg.functions.graph_embedding.node2vec.node2vec(G,
I get the following error:
AttributeError: 'function' object has no attribute 'node2vec'

Documentation page seems broken

On the reference pages in the documentation I don't get any details shown, only headings are visible, e.g.

https://easy-graph.github.io/docs/reference/easygraph.classes.html

only has the following content:

easygraph.classes package
Submodules
easygraph.classes.base module
easygraph.classes.directed_graph module
easygraph.classes.directed_multigraph module
easygraph.classes.graph module
easygraph.classes.graphviews module
easygraph.classes.hypergraph module
easygraph.classes.multigraph module
easygraph.classes.operation module
Module contents

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.