Code Monkey home page Code Monkey logo

stefaniaebli / simplicial_neural_networks Goto Github PK

View Code? Open in Web Editor NEW
65.0 6.0 7.0 2.47 MB

Simplicial neural networks (SNNs), a generalization of graph neural networks to data that live on a class of topological spaces called simplicial complexes.

Home Page: https://arxiv.org/abs/2010.03633

License: MIT License

Python 12.64% Makefile 0.06% Jupyter Notebook 87.14% Shell 0.16%
graph-neural-networks deep-learning laplacian convolutional-neural-networks topological-data-analysis geometric-deep-learning

simplicial_neural_networks's Introduction

Simplicial Neural Networks

Stefania Ebli, Michaël Defferrard, Gard Spreemann

We present simplicial neural networks (SNNs), a generalization of graph neural networks to data that live on a class of topological spaces called simplicial complexes. These are natural multi-dimensional extensions of graphs that encode not only pairwise relationships but also higher-order interactions between vertices—allowing us to consider richer data, including vector fields and n-fold collaboration networks. We define an appropriate notion of convolution that we leverage to construct the desired convolutional neural networks. We test the SNNs on the task of imputing missing data on coauthorship complexes.

Installation

Binder   Click the binder badge to run the code from your browser without installing anything.

  1. Clone this repository.

    git clone https://github.com/stefaniaebli/simplicial_neural_networks.git
    cd simplicial_neural_networks
  2. Create the environment.

    CONDA_CHANNEL_PRIORITY=flexible conda env create -f environment.yml
    conda activate snn

Notebooks

Reproducing our results

Run the below to train a SNN to impute missing data (citations) on the simplicial complex (which encodes collaborations between authors).

python ./experiments/impute_citations.py ./data/s2_3_collaboration_complex ./experiments/output 150250 30

Data

The data necessary to reproduce our experiment are found in the ./data/s2_3_collaboration_complex folder. The below three stages will recreate them.

  1. Download the full archive of the Open Research Corpus from Semantic Scholar, version 2018-05-03, which contains over 39 million published research papers in Computer Science, Neuroscience, and Biomedical.

    wget -i https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/2018-05-03/manifest.txt -P data/s2_1_raw/

    This step populates the ./data/s2_1_raw folder.

  2. Create a bipartite graph, whose vertices are papers (39,219,709 of them) in one part and authors (12,862,455 of them) in the other. A paper is connected to all its co-authors, and an author is connected to all the papers they wrote, leading to 139,268,795 edges. A citation count (the number of times the paper was cited) is available for each paper (from 0 to 37,230 citations per paper).

    # Create a bipartite graph from Semantic Scholar.
    python s2_1_corpus_to_bipartite.py
    # Clean and downsample that bipartite graph.
    python s2_2_downsample_bipartite.py
    # Project the bipartite graph to a graph between authors.
    python s2_3_bipartite_to_graphs.py

    Those steps populate the ./data/s2_2_bipartite_graph folder. Alternatively, that processed data is available at doi:10.5281/zenodo.4144319.

  3. Build the collaboration complex (where each collaboration of authors is represented by a simplex) and citation cochains (which are the number of citations attributed to the collaborations).

    # Downsample the bipartite graph to have a connected simplicial complex.
    python s2_4_bipartite_to_downsampled.py
    # From a bipartite graph to a simplicial complex with k-cochains.
    python s2_5_bipartite_to_complex.py
    # From a simplicial complex to k-degree Laplacians.
    python s2_6_complex_to_laplacians.py
    # Artificially insert missing data on k-cochains.
    python s2_7_cochains_to_missingdata.py

    Those steps populate the ./data/s2_3_collaboration_complex folder.

License & citation

The content of this repository is released under the terms of the MIT license. Please cite our paper if you use it.

@inproceedings{snn,
  title = {Simplicial Neural Networks},
  author = {Ebli, Stefania and Defferrard, Michaël and Spreemann, Gard},
  booktitle = {Topological Data Analysis and Beyond workshop at NeurIPS},
  year = {2020},
  archiveprefix = {arXiv},
  eprint = {2010.03633},
  url = {https://arxiv.org/abs/2010.03633},
}

simplicial_neural_networks's People

Contributors

gspr avatar mdeff avatar stefaniaebli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

simplicial_neural_networks's Issues

Imputing data inquiry

Hi there,

Firstly, thanks for your work. I have a few confusion regarding the paper and would like to clarify.
In the paper, the model seems to be tasked to impute data where some portion of the values were replaced with a constant.
However, i do not see how the data is imputed. This is because the paper introduced simplicial convolution which seemed to mean learning a kind of filter. Does imputing the right values just mean filtering the input p-cochains to have the 'right' value?

Another question is that, is there a reason why we impute data on the co-chains instead of chains?

Thanks

Regards :)

CC1 and CC2 Benchmarking

Hi there. I've read the associated paper and am very interested in the methodology. In particular (for a course project) I would like to see if I can improve on your results (even in a small way) but am having trouble seeing how to run the given code on a test set. My questions are as follows:

  1. Does the s2 dataset (for which processing instructions are listed in the README) correspond to the CC2 set from the paper?
  2. Does your code have a quick way of reproducing your test metrics on the imputed data? I would be very happy to implement something like this.
  3. Is there a straightforward path to partitioning the given dataset for train / test purposes? Or perhaps managing 2 datasets, one for train and one for test?

Any clarity here would be very much appreciated. Thanks for the neat paper!

How to get the precision and error rate curves?

I have been studying your paper recently, and I am amazed that it is so well done. The experimental results after convolution have been taken, but the results are saved in text format. I am curious about how to draw the accurate curve and the wrong curve in your paper, is it convenient for you to teach? Thank you very much!!
1674917044187

Can angular information be directly encoded using this method?

Hi

I find this work very interesting and I've been interesting in applying it to my context where graph networks are popular. for the 3 body interactions is it possible to effectively include "opening angles" or other 3 body interactions from the perspective of a certain node or do things need to be expressed as hypervolumes? What would be the best way to approach 3 dimensional euclidian spaces with this method?

I hope what I'm getting across makes sense! Again thanks for this work it seems like a natural extension of graph networks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.