Code Monkey home page Code Monkey logo

xiangyue9607 / bionev Goto Github PK

View Code? Open in Web Editor NEW
221.0 11.0 76.0 27.68 MB

Graph Embedding Evaluation / Code and Datasets for "Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations" (Bioinformatics 2020)

License: MIT License

Python 100.00%
graph-embedding graph-embeddings-evaluation graph-embedding-methods biomedical-networks node-classification link-prediction network-embedding deepwalk node2vec line-embedding

bionev's People

Contributors

cthoyt avatar ddomingof avatar huang2960 avatar xiangyue9607 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bionev's Issues

Hyper-parameters for word2vec

Hello,

Thank you for this amazing work!
In the context of my PhD thesis, I need to run some comparison of your work with some experimental models, and I need some hyper-parameters that I could not find in either the paper or the supplementary materials.

I wanted to ask you which hyper-parameters were used for the different graphs in the skip-gram based models, specifically:

  1. What is the window size for the context? That is, how near the nodes have to be to the central node to be considered contextual during the embedding process. I see that in your code there is 10 as default value but in small connected graphs such as the ones you have considered that would mean that every node is contextual to every other node in the same connected component if the small world hypothesis holds for these graphs.
  2. What loss function has been used? NCE loss? A complete softmax? If the NCE loss has been used, how many negative samples have been used for the process?

Thank you and have a nice day,
Luca

which pre-trained word2vec model?

Hi,

I am new to this area, i am confused about wod2vec part in node2vec. As there are different pre-trained models of word2vec available, such as generic English and Pubmed models. Which pre-trained model of word2vec has been used in this experiment?

Use pre-trained model to compute embedding in test graph

Hi,

Suppose that we have two graphs, namely training and a test graph. I wonder (1) how to train the node2vec (or any other method) model on the training graph and (2) later use this model for computing embeddings from the test graph.

The important code chunk goes as follows:

from bionev.OpenNE import node2vec
model = node2vec.Node2vec(graph=g_train, path_length=64, num_paths=32, dim=128, p=1, q=1)

Regards, Andrej

AssertionError Problem

When I try recurring the code, it occurs a AssertionError problem dealing with the Clin_Term_COOC data. Since I didn't change the code , I wonder the reason for this.
"assert len(node_list) == len(embedding_look_up)"

testing_ratio

Hi,I tried to change the training and test sets to 9: 1, but the ratio between the training and test sets did not change. For example: when testing_ratio = 0.1 and 0.2, the ratio of links between the original network and the training network is the same. How to solve it? Thank you

Original Graph: nodes: 1133 edges: 5451
Training Graph: nodes: 1133 edges: 4395

all esting_ratio:

Searching 'testing_ratio' in E:\puo\BioNEV-master\src*.py ...
E:\puo\BioNEV-master\src\evaluation.py: 81: def NodeClassification(embedding_look_up, node_list, labels, testing_ratio, seed):
E:\puo\BioNEV-master\src\evaluation.py: 84: testing_ratio=testing_ratio,seed=seed)
E:\puo\BioNEV-master\src\main.py: 30: parser.add_argument('--testing_ratio', default=0.1, type=float,
E:\puo\BioNEV-master\src\utils.py: 52: def split_train_test_graph(input_edgelist, seed, testing_ratio=0.1, weighted=False):
E:\puo\BioNEV-master\src\utils.py: 60: testing_edges_num = int(len(G.edges) * testing_ratio)
E:\puo\BioNEV-master\src\utils.py: 151: def split_train_test_classify(embedding_look_up, X, Y, seed, testing_ratio=0.1):
E:\puo\BioNEV-master\src\utils.py: 153: training_ratio = 1 - testing_ratio
Hits found: 7

node2vec is killed in evaluation phase

Hi,

First thanks for a great paper. However, when I try to reproduce your results, the node2vec method is suddenly killed in the evaluation procedure.

I used the following line to start with node2vec embedding:
bionev --input ./data/Clin_Term_COOC/Clin_Term_COOC.edgelist --output ./embeddings/node2vec.txt --method node2vec --task link-prediction --eval-result-file eval_results2.txt --weighted True

The output lines:

######################################################################
Embedding Method: node2vec, Evaluation Task: link-prediction
######################################################################
Original Graph: nodes: 48651 edges: 1659249
Training Graph: nodes: 48651 edges: 1328307
Loading training graph for learning embedding...
Graph Loaded...
Preprocess transition probs...
Begin random walk...
Walk finished...
Learning representation...
Saving embeddings...
Embedding Learning Time: 10034.56 s
Nodes with embedding: 48651
Begin evaluation...
Killed

I have 256GB memory on my server so I suspect that RAM is not in-game. When I try with a smaller dataset, the evaluation phase ended successfully.

Any idea what do do?

Best, Andrej

Include models from PyKEEN

There are several KGE models implemented in https://github.com/smartDataAnalytics/PyKEEN from @mali-git that cover translational distance models (e.g., TransE, TransH, TransR, TransD, UM, SE) and semantic matching models (e.g., RESCAL, DistMult, ERMLP, ConvE) that weren't mentioned in the README of this repo. I'm also aware that some, but not all, of these models have been made available through the packages that you've already integrated.

PyKEEN was developed with specific focus on reusability, so I hope that we can either make a PR to include a wrapper to make it work the same way as your models, or you might become interested and make use of it.

Make code pip installable

It looks like the code is organized such that it can be pip installable. This would make dependency management and reusability a lot better. I will submit a PR :)

ValueError problem

When I run the code, it occurs a ValueError problem dealing with the Clin_Term_COOC data.
"multilabel-indicator is not supported"

ValueError: not enough values to unpack (expected 3, got 2)

I run SDNE,DeepWalk(openNE) wrong. as follow:

Original Graph: nodes: 1133 edges: 5451
Training Graph: nodes: 1133 edges: 3874
Loading training graph for learning embedding...
Traceback (most recent call last):
File "main.py", line 198, in
more_main()
File "main.py", line 194, in more_main
main(parse_args())
File "main.py", line 126, in main
embedding_training(args, train_graph_filename)
File "F:\BioNEV-master\src\embed_train.py", line 25, in embedding_training
g = read_for_OpenNE(train_graph_filename, weighted=args.weighted)
File "F:\BioNEV-master\src\utils.py", line 17, in read_for_OpenNE
G.read_edgelist(filename=filename, weighted=weighted)
File "F:\BioNEV-master\src\OpenNE\graph.py", line 79, in read_edgelist
func(l)
File "F:\BioNEV-master\src\OpenNE\graph.py", line 65, in read_weighted
src, dst, w = l.split()
ValueError: not enough values to unpack (expected 3, got 2)
How to deal with? thank

struc2vec

When I implemented struc2vec, I encountered the following problems:
######################################################################
Original Graph: nodes: 1133 edges: 5451
Training Graph: nodes: 1133 edges: 4923
Loading training graph for learning embedding...
Graph Loaded...
Traceback (most recent call last):
File "main.py", line 197, in
more_main()
File "main.py", line 193, in more_main
main(parse_args())
File "main.py", line 125, in main
embedding_training(args, train_graph_filename)
File "E:\puo\BioNEV-master\src\bionev\embed_train.py", line 27, in embedding_training
embedding_training(args, G=g)
File "E:\puo\BioNEV-master\src\bionev\embed_train.py", line 37, in embedding_training
format='%(asctime)s %(message)s')
File "E:\puo\lib\logging_init
.py", line 1808, in basicConfig
h = FileHandler(filename, mode)
File "E:\puo\lib\logging_init_.py", line 1032, in init
StreamHandler.init(self, self.open())
File "E:\puo\lib\logging_init
.py", line 1061, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding)
FileNotFoundError: [Errno 2] No such file or directory: 'E:\puo\BioNEVmaster\src\src\bionev\struc2vec\struc2vec.log'

How can I solve it? Thank you

Reproduction question: how were the datasets generated?

This repository includes five nice datasets which makes reproduction much easier, but it's missing whatever scripts (or instructions if manually done) were used to generate them. This would be helpful not only to ensure the correctness of the training data, but to enable periodic reproduction as the underlying databases are updated

STRING_PPI

Dear Xiang Yue,
Could you please which protein protein interaction data file in STRING website did you use to generate the edgelist? Is it 9606.protein.links.v10.5.txt.gz?
Thank you!

edge duplication

Hi @xiangyue9607 Thanks for releasing these datasets.

Recently, I try to use Mashup_PPI to do some experiments, and find that there are many duplicate edges in the edgelist. Perhaps it needs to be cleaned.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.