Code Monkey home page Code Monkey logo

molclr's Introduction

Molecular Contrastive Learning of Representations via Graph Neural Networks

Nature Machine Intelligence [Paper] [arXiv] [PDF]

Yuyang Wang, Jianren Wang, Zhonglin Cao, Amir Barati Farimani
Carnegie Mellon University

This is the official implementation of MolCLR: "Molecular Contrastive Learning of Representations via Graph Neural Networks". In this work, we introduce a contrastive learning framework for molecular representation learning on large unlabelled dataset (~10M unique molecules). MolCLR pre-training greatly boosts the performance of GNN models on various downstream molecular property prediction benchmarks. If you find our work useful in your research, please cite:

@article{wang2022molclr,
  title={Molecular contrastive learning of representations via graph neural networks},
  author={Wang, Yuyang and Wang, Jianren and Cao, Zhonglin and Barati Farimani, Amir},
  journal={Nature Machine Intelligence},
  pages={1--9},
  year={2022},
  publisher={Nature Publishing Group},
  doi={10.1038/s42256-022-00447-x}
}

Getting Started

Installation

Set up conda environment and clone the github repo

# create a new environment
$ conda create --name molclr python=3.7
$ conda activate molclr

# install requirements
$ pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
$ pip install torch-geometric==1.6.3 torch-sparse==0.6.9 torch-scatter==2.0.6 -f https://pytorch-geometric.com/whl/torch-1.7.0+cu110.html
$ pip install PyYAML
$ conda install -c conda-forge rdkit=2020.09.1.0
$ conda install -c conda-forge tensorboard
$ conda install -c conda-forge nvidia-apex # optional

# clone the source code of MolCLR
$ git clone https://github.com/yuyangw/MolCLR.git
$ cd MolCLR

Dataset

You can download the pre-training data and benchmarks used in the paper here and extract the zip file under ./data folder. The data for pre-training can be found in pubchem-10m-clean.txt. All the databases for fine-tuning are saved in the folder under the benchmark name. You can also find the benchmarks from MoleculeNet.

Pre-training

To train the MolCLR, where the configurations and detailed explaination for each variable can be found in config.yaml

$ python molclr.py

To monitor the training via tensorboard, run tensorboard --logdir ckpt/{PATH} and click the URL http://127.0.0.1:6006/.

Fine-tuning

To fine-tune the MolCLR pre-trained model on downstream molecular benchmarks, where the configurations and detailed explaination for each variable can be found in config_finetune.yaml

$ python finetune.py

Pre-trained models

We also provide pre-trained GCN and GIN models, which can be found in ckpt/pretrained_gin and ckpt/pretrained_gcn respectively.

Acknowledgement

molclr's People

Contributors

yuyangw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.