Code Monkey home page Code Monkey logo

holoprot's Introduction

Multi-Scale Representation Learning on Proteins

(Under Construction and Subject to Change)

Pending: Update links for dataset.

This is the official PyTorch implementation for HoloProt (Somnath et al. 2021)

Binaries

Our work utilizes several binaries for generating surfaces, compressing them and computing chemical features and secondary structures.

  • MSMS (2.6.1). To compute the surface of proteins.
  • DSSP. To compute the secondary structure of proteins.
  • BLENDER. To fix meshes and remove any redundancies, while reducing them to a desired number of faces.
  • PDB2PQR (2.1.1), multivalue, and APBS (1.5). These programs are necessary to compute electrostatics charges.
Environment Variables

After downloading the binaries, one needs to set environment variables to the corresponding paths.

echo 'export PROT=/path/to/dir/' >> ~/.bashrc
echo 'export DSSP_BIN=' >> ~/.bashrc
echo 'export MSMS_BIN=/path/to/msms/' >> ~/.bashrc
echo 'export APBS_BIN=/path/to/apbs/bin/apbs' >> ~/.bashrc
echo 'export BLENDER_BIN=/path/to/blender/blender' >> ~/.bashrc
echo 'export PDB2PQR_BIN=/path/to/pdb2pqr/pdb2pqr' >> ~/.bashrc
echo 'export MULTIVALUE_BIN=/path/to/apbs/share/apbs/tools/bin/multivalue' >> ~/.bashrc
source ~/.bashrc

As a sanity check for correct installation, try entering $BINARY_NAME in the command line, and check if it produces a meaningful result. If it throws a lib.xx.xx.so not found, please try setting your LD_LIBRARY_PATH to the appropriate directories.

Installation

To install all dependencies, run

./install_dependencies.sh

If you want jupyter notebook support (may have errors), run the following commands (inside prot):

conda install -c anaconda ipykernel
python -m ipykernel install --user --name=prot

Change the kernel name to prot or create a new ipython notebook using prot as the kernel.

Datasets

Datasets are organized in the $PROT/datasets directory. The raw datasets are placed in $PROT/datasets/raw while the processed datasets are placed in $PROT/datasets/processed

Dataset Download

Download the PDBBind dataset from here and the Enzyme dataset from here, and place them in $PROT/datasets/raw, and untar these files.

TODO: Add examples on how people can process their own pdb files

Dataset Cleanup and Running binaries

Before preparing the graph objects, we need to clean up the pdb files and run the binaries. Possible set of tasks include:

  • pdbfixer: Clean up PDB files and add any missing residues.
  • dssp: Secondary structure computation using the DSSP binary
  • surface: Constructs the triangular surface mesh using MSMS and compresses it to a desired size using BLENDER
  • charges: Computes electrostatics on the given surface using PDB2PQR, APBS and MULTIVALUE binaries
  • all: Runs all the tasks listed above
python -W ignore scripts/preprocess/run_binaries.py --dataset DATASET_NAME --tasks TASK_NAME

where DATASET_NAME can be one of pdbbind, enzyme, and TASK_NAME is one of pdbfixer, dssp, surface, charges, all

Superpixel Preparation

Molecular superpixels are constructed using a modified version of ERS. Follow the steps below to first prepare the surface graphs, and then generate the molecular superpixel assignments,

python -W ignore scripts/preprocess/prepare_graphs.py --dataset DATASET_NAME --prot_mode surface
python -W ignore scripts/preprocess/generate_patches.py --dataset DATASET_NAME --seg_mode ers --n_segments N_SEGMENTS

HoloProt Graph Construction

EXP_NAME="ERS_balance=0.5_n_segments=20"
python -W ignore scripts/preprocess/prepare_graphs.py --dataset DATASET_NAME --prot_mode surface2backbone
python -W ignore scripts/preprocess/prepare_graphs.py --dataset DATASET_NAME --prot_mode patch2backbone --exp_name EXP_NAME --n_segments 20

After preprocessing, check if the following directories exist: $PROT/datasets/processed/DATASET_NAME/surface2backbone and $PROT/datasets/processed/DATASET_NAME/patch2backbone_n_segments=20

Running Experiments

We use wandb to track out experiments. Please make sure to have the setup complete before doing that.

Default configurations for running experiments can be found in config/train/DATASET_NAME/

For PDBBind, the files are organized as config/train/pdbbind/SPLIT.yaml where SPLIT is one of {identity30, identity60, scaffold}.

For Enzyme dataset, the file is config/train/enzyme/default_config.yaml.

To run the experiments for PDBBind,

python scripts/train/run_model.py --config_file config/train/pdbbind/SPLIT.yaml

To run experiments for Enzyme,

python scripts/train/run_model.py --config_file config/train/enzyme/default_config.yaml

Please raise an issue if the commands don't work as expected, or you need help interpreting an error message.

License

This project is licensed under the MIT-License. Please see LICENSE.md for more details.

Reference

If you find our code useful for your work, please cite our paper:

@inproceedings{
somnath2021multiscale,
title={Multi-Scale Representation Learning on Proteins},
author={Vignesh Ram Somnath and Charlotte Bunne and Andreas Krause},
booktitle={Advances in Neural Information Processing Systems},
editor={A. Beygelzimer and Y. Dauphin and P. Liang and J. Wortman Vaughan},
year={2021},
url={https://openreview.net/forum?id=-xEk43f_EO6}
}

Please also consider citing the MaSIF work, whose code we use for preparing and computing features on surfaces:

@article{gainza2020deciphering,
  title={Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning},
  author={Gainza, P and Sverrisson, F and Monti, F and Rodol{\`a}, E and Boscaini, D and Bronstein, MM and Correia, BE},
  journal={Nature Methods},
  volume={17},
  number={2},
  pages={184--192},
  year={2020},
  publisher={Nature Publishing Group}
}

Contact

If you have any questions about the code, or want to report a bug, or need help interpreting an error message, please raise a GitHub issue.

holoprot's People

Contributors

vsomnath avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.