Code Monkey home page Code Monkey logo

subgraph2vec_tf's Introduction

subgraph2vec

This repository contains the "tensorflow" implementation of our paper "subgraph2vec: Learning Distributed Representations of Rooted Sub-graphs from Large Graphs". The paper could be found at: https://arxiv.org/pdf/1606.08928.pdf

Dependencies

This code is developed in python 2.7. It is ran and tested on Ubuntu 14.04 and 16.04. It uses the following python packages:

  1. tensorflow (version == 0.12.1)
  2. networkx (version <= 1.11)
  3. joblib (version <= 0.11)
  4. scikit-learn (+scipy, +numpy)
The procedure for setting up subgraph2vec is as follows:
1. git clone the repository (command: git clone https://github.com/MLDroid/subgraph2vec_tf.git )
2. untar the data.tar.gz tarball
The procedure for obtaining rooted subgraph vectors using subgraph2vec and performing graph classification is as follows:
1. move to the folder "src" (command: cd src) (also make sure that kdd 2015 paper's (Deep Graph Kernels) datasets are available in '../data/kdd_datasets/dir_graphs/')
2. run main.py --corpus <dataset of graph files> --class_labels_file_name <file containing class labels of graphs to be used for graph classification> file to:
	*Generate the weisfeiler-lehman kernel's rooted subgraphs from all the graphs 
	*Train skipgram model to learn subgraph embeddings. The same will be dumped in ../embeddings/ folder
	*Perform graph classification using graph kernel and deep graph kernel
3. example: 
	*python main.py --corpus ../data/kdd_datasets/mutag --class_labels_file_name ../data/kdd_datasets/mutag.Labels 
	*python main.py --corpus ../data/kdd_datasets/proteins --class_labels_file_name ../data/kdd_datasets/proteins.Labels --batch_size 16 --embedding_size 128 --num_negsample 5

Other command line args:

optional arguments:
	-h, --help            show this help message and exit
	-c CORPUS, --corpus CORPUS
			        Path to directory containing graph files to be used
			        for graph classification or clustering
	-l CLASS_LABELS_FILE_NAME, --class_labels_file_name CLASS_LABELS_FILE_NAME
			        File name containg the name of the sample and the
			        class labels
	-o OUTPUT_DIR, --output_dir OUTPUT_DIR
			        Path to directory for storing output embeddings
	-b BATCH_SIZE, --batch_size BATCH_SIZE
			        Number of samples per training batch
	-e EPOCHS, --epochs EPOCHS
			        Number of iterations the whole dataset of graphs is
			        traversed
	-d EMBEDDING_SIZE, --embedding_size EMBEDDING_SIZE
			        Intended subgraph embedding size to be learnt
	-neg NUM_NEGSAMPLE, --num_negsample NUM_NEGSAMPLE
			        Number of negative samples to be used for training
	-lr LEARNING_RATE, --learning_rate LEARNING_RATE
			        Learning rate to optimize the loss function
	--n_cpus N_CPUS       Maximum no. of cpu cores to be used for WL kernel
			        feature extraction from graphs
	--wlk_h WLK_H         Height of WL kernel (i.e., degree of rooted subgraph
			        features to be considered for representation learning)
	-lf LABEL_FILED_NAME, --label_filed_name LABEL_FILED_NAME
			        Label field to be used for coloring nodes in graphs
			        using WL kenrel
	-v VALID_SIZE, --valid_size VALID_SIZE
			        Number of samples to validate training process from
			        time to time

Contact

In case of queries, please email: [email protected] OR [email protected]

Reference

Please consider citing the follow paper when you use this code.
@article{narayanansubgraph2vec,
  title={subgraph2vec: Learning Distributed Representations of Rooted Sub-graphs from Large Graphs},
  author={Narayanan, Annamalai and Chandramohan, Mahinthan and Chen, Lihui and Liu, Yang and Saminathan, Santhoshkumar}
}

Acknowledgements

Thanks to Zhang Xinyi (https://github.com/XinyiZ001) for the support on coding/testing subgraph2vec TF version.

subgraph2vec_tf's People

Contributors

annamalai-nr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

subgraph2vec_tf's Issues

System not Loading the graph files.

Following error occurs, after applying the subgraph2vec technique on my dataset. I also have maintained the specific versions of the python packages, as mentioned.

No handlers could be found for logger "root"
INFO:root:Loaded 59 graph file names form /home/ksec/AMDS/tmp1/cumul
ERROR:root:unable to load graph from file: /home/ksec/AMDS/tmp1/cumul/00001AC7364E668F1DDC9906887EDF0C8F7230830B864DB6179075AC9C0A6892.gexf
ERROR:root:unable to load graph from file: /home/ksec/AMDS/tmp1/cumul/000014F7037586315DE348D21337B90B83A1C887E247DA8E4CC043702E36DFBA.gexf
ERROR:root:unable to load graph from file: /home/ksec/AMDS/tmp1/cumul/00041EC3C57ED823C14129F60DC5B0BCDB6175EF286F85DD25B5BAA317B0EB38.gexf
......

Does it really work?Is it better than a randomly generated vector

def get_subgraph_embedding(embeddings_model_file):
    embedding_word_size = 0
    embedding_dims = 0
    word_embeddings = {}
    print("Loading embedding model %s" %(embeddings_model_file))
    with open(embeddings_model_file) as fh:
        # skip first line
        embedding_word_size, embedding_dims = fh.readline().rstrip().split()
        embedding_word_size = int(embedding_word_size)
        embedding_dims = int(embedding_dims)
        print('sub2Vec Info : %d X %d' %(embedding_word_size, embedding_dims))

        for line in fh.readlines():
            values = line.rstrip().split()
            word = values[0]
            # embedding = np.asarray(values[1:], dtype=np.float64)
            rand_seed = np.uint32(hash(word))
            np.random.seed(rand_seed)
            embedding = np.random.randn(embedding_dims)
            word_embeddings[word] = embedding

    return embedding_word_size, embedding_dims, word_embeddings

I change it to py3, and using randomly generated vectors,and got better result。
So I'm trying to keep thinking scientifically.
Use subgraphs to encode the full graph,
Does the vector of the subgraph have to be important,that we got it from skipgram or randomly generated vector

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.