Code Monkey home page Code Monkey logo

moleculargnn_smiles's Introduction

Graph neural network (GNN) for molecular property prediction (SMILES format)

Important: this repository will not be further developed and maintained because we have shown and believe that graph neural networks or graph convolutional networks are incorrect and useless for modeling molecules (see our paper in NeurIPS 2020). Please consider switching to our new and simple machine learning model called quantum deep field.

This is the code of a graph neural network (GNN) for molecules, which is based on learning representations of r-radius subgraphs (or fingerprints) in molecules. This GNN is proposed in our paper "Compound-protein Interaction Prediction with End-to-end Learning of Neural Networks for Graphs and Sequences (Bioinformatics, 2018)," which aims to predict compound-protein interactions for drug discovery. Using the proposed GNN, in this page we provide an implementation of the model for predicting various molecular properties such as drug efficacy and photovoltaic efficiency.

Characteristics

  • This code is easy to use. After setting the environment (e.g., PyTorch), preprocessing data and learning a model can be done by only two commands (see "Usage").
  • If you prepare a molecular property dataset with the same format as provided in the dataset directory, you can learn our GNN with your dataset by the two commands (see "Training of our GNN using your molecular property dataset").

Our GNN model

The basic idea of a GNN can be described as follows:

The GNN (1) updates the randomly initialized atom vectors considering the graph structure of a molecule, (2) obtains the molecular vector, and then (3) learns the neural network parameters including the atom vectors via backpropagation to predict a molecular property. That is, this is the end-to-end learning that does not require input features or descriptors used in chemoinformatics.

In drug compounds, for example, each atom, chemical bond, and their connections in a molecular graph are not so important. More important in drug compounds is to consider relatively large fragments in a molecular graph, e.g., β-lactam in penicillin. Such fragments are referred to as r-radius subgraphs or molecular fingerprints. Based on this observation, our GNN leverages molecular fingerprints and the model can be described as follows:

Thus, instead of using atom vectors, we (1) extract the fingerprints from a molecular graph and initialize them using random vectors, (2) obtain the molecular vector by GNN, and then (3) learn the representations. This GNN allows us to learn the representations of molecular fingerprints.

Requirements

  • PyTorch
  • scikit-learn
  • RDKit

Usage

We provide two major scripts in the main directory as follows.

  • "preprocessing.py" creates tensor data from original text data (see dataset/././data.txt).
  • "train.py" trains a GNN model using the preprocessed data to predict a molecular property.

You can easy to train a GNN model by the following commands.

Clone our repository,

git clone https://github.com/masashitsubaki/molecularGNN_smiles.git

change directory,

cd molecularGNN_smiles/main

and run the bash file for training.

bash train.sh

You can also change the model hyperparameters described in train.sh (e.g., the dimensionality, number of hidden layers, and batch size). Try to learn various GNN models to find your own best model for your dataset!

Learning a GNN with your dataset

In this repository, we provide two datasets of regression (see dataset/regression/photovoltaic/data.txt) and classification (see dataset/classification/HIV/data.txt) as follows:

If you prepare a dataset with the same format (any molecular property can be used!), you can learn a GNN model with your dataset.

How to cite

@article{tsubaki2018compound,
  title={Compound-protein Interaction Prediction with End-to-end Learning of Neural Networks for Graphs and Sequences},
  author={Tsubaki, Masashi and Tomii, Kentaro and Sese, Jun},
  journal={Bioinformatics},
  year={2018}
}

moleculargnn_smiles's People

Contributors

masashitsubaki avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.