Code Monkey home page Code Monkey logo

geossl's Introduction

Molecular Geometry Pretraining with SE(3)-Invariant Denoising Distance Matching

ICLR 2023

Authors: Shengchao Liu, Hongyu Guo, Jian Tang

[Project Page] [OpenReview] [ArXiv]

This repository provides the source code for the ICLR'23 paper Molecular Geometry Pretraining with SE(3)-Invariant Denoising Distance Matching, with the following task:

  • This project explores the geometric representation learning on molecules.
  • We consider pure geometric information, i.e., the molecule conformation.
  • For pretraining, we consider Molecule3D.
  • For fine-tuning (downstream), we consider QM9, MD17, LBA & LEP.

Environments

conda create -n Geom3D python=3.7
conda activate Geom3D
conda install -y -c rdkit rdkit
conda install -y numpy networkx scikit-learn
conda install -y -c conda-forge -c pytorch pytorch=1.9.1
conda install -y -c pyg -c conda-forge pyg=2.0.2
pip install ogb==1.2.1

pip install sympy

pip install ase  # for SchNet

pip install atom3d # for Atom3D
pip install cffi # for Atom3D
pip install biopython # for Atom3D

pip intall -e .

Datasets

  • For Molecule3D (pretraining) dataset, please run examples/generate_Molecule3D.py. The default path is data/Molecule3D/Molecule3D_1000000.
  • For QM9, it is automatically downloaded in pyg class. The default path is data/molecule_datasets/qm9.
  • For MD17, it is automatically downloaded in pyg class. The default path is data/md17.
  • For LBA, please use the following commands:
cd data
mkdir -p lba/raw
mkdir -p lba/processed
cd lba/raw

wget http://www.pdbbind.org.cn/download/PDBbind_v2020_refined.tar.gz
tar -xzvf PDBbind_v2020_refined.tar.gz

wget https://zenodo.org/record/4914718/files/LBA-split-by-sequence-identity-30.tar.gz
tar -xzvf LBA-split-by-sequence-identity-30.tar.gz
mv split-by-sequence-identity-30/indices ../processed/
mv split-by-sequence-identity-30/targets ../processed/
  • For LEP, please use the following commands:
cd data
mkdir -p lep/raw
mkdir -p lep/processed
cd lep/raw

wget https://zenodo.org/record/4914734/files/LEP-raw.tar.gz
tar -xzvf LEP-raw.tar.gz
wget https://zenodo.org/record/4914734/files/LEP-split-by-protein.tar.gz
tar -xzvf LEP-split-by-protein.tar.gz

Pretraining

For pretraining, we provide implementations on eight pretraining baselines and our proposed GeoSSL-DDM under the examples folder:

  • Supervised pretraining in pretrain_Supervised.py.
  • Type prediction pretraining in pretrain_ChargePrediction.py.
  • Distance prediction pretraining in pretrain_DistancePrediction.py.
  • Angle prediction pretraining in pretrain_TorsionAnglePreddiction.py.
  • 3D InfoGraph pretraining in pretrain_3DInfoGraph.py.
  • GeoSSL pretraining framework in pretrain_GeoSSL.py.
    • GeoSSL-RR pretraining with argument --GeoSSL_option=RR.
    • GeoSSL-InfoNCE pretraining with argument --GeoSSL_option=InfoNCE.
    • GeoSSL-EBM-NCE pretraining with argument --GeoSSL_option=EBM-NCE.
    • GeoSSL-DDM pretraining (ours) with argument --GeoSSL_option=DDM.

The running scripts and corresponding hyper-parameters can be found in scripts/pretrain_baselines and scripts/pretrain_GeoSSL_DDM.

Downstream

The downstream scripts can be found under the examples folder:

  • finetune_qm9.py
  • finetune_md17.py
  • finetune_lba.py
  • finetune_lep.py

The running scripts and corresponding hyper-parameters can be found in scripts/finetune. Note that as a fair comparison, we keep a fixed hyper-parameter set for each downstream task, and the only difference is the pretrained checkpoints.

Checkpoints

We provide both the log files and checkpoints for GeoSSL-DDM here. The log files and checkpoints for other baselines will be released in the next version.

Cite us

Feel free to cite this work if you find it useful to you!

@inproceedings{
    liu2023molecular,
    title={Molecular Geometry Pretraining with {SE}(3)-Invariant Denoising Distance Matching},
    author={Shengchao Liu and Hongyu Guo and Jian Tang},
    booktitle={The Eleventh International Conference on Learning Representations},
    year={2023},
    url={https://openreview.net/forum?id=CjTHVo1dvR}
}

geossl's People

Contributors

chao1224 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

dongcf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.