Code Monkey home page Code Monkey logo

protac-rl's Introduction

PROTAC-RL

Source code for the Nature Machine Intelligence paper Accelerated rational PROTAC design via deep learning and molecular simulations.

Protac-RL

PROTAC-RL is a novel deep reinforcement learning-driven generative model for the rational design of PROTACs in a low-resource setting.

Install requirements

Python = 3.6.10

rdkit = 2019.09.2.0, torch = 1.8.0+cu111

Python packages listed in environment.yml

To install all the Python packages, create a new conda environment:

conda env create -f environment.yml
conda activate PROTAC-RL

Pre-processing

The tokenized datasets can be found on the data/ folder.

PROTAC, ZINC are datasets from PROTAC-DB, ZINC (molecular weight > 500). For each dataset, we have two types of SMILES. One is canonical SMILES and another is random SMILES (data augmentation).

We use a shared vocabulary. The vocab_size and seq_length are chosen to include the whole datasets.

In order to have the same vocabulary for pre-training and fine-tuning, remember to move .vocab.pt from ZINC dataset to PROTAC dataset after PROTAC dataset being pre-processed.

NOTICE

GitHub would ignore origin empty log folder in our code. To avoid the FileNotFoundError, you could mkdir a new log folder at first.

Pre-training

pre-training can be started by running the training.sh script using ZINC dataset

Fine-tuning

fine-tuning script fine-tune-training.sh can be run after pre-training using PROTAC dataset

RL (beam search & Multinomial sampling)

To train the RL model use the train_case.sh script, training in beam search can use train_type as B and in multinomial sampling can use train_type as M. Annotation is sufficient in script file train_case.sh.

In most of cases, multinomial sampling performs better because of its ability to explore larger chemical space.

The input of cases can refer to case/dBET6/

Generation (beam search & multinomial sampling)

Model generation of beam search can be started by running the testing_beam_search_*.sh script. There are several similar generating files provided by us, and usage of each file was written in annotation in beginning of each file.

Model generation of multinomial sampling can be started by running the testing_msearch_*.sh script. Also, there are several similar generating files provided by us, and usage of each file was written in annotation in beginning of each file.

Example

To specifically describe how to train and use our PROTAC-RL, we showed an example below:

pre-processing for ZINC (parameters of script changed to ZINC)

bash preprocess.sh

then tune parameters to PROTACs

bash preprocess.sh

pre-training with ZINC data

bash training.sh

before fine-tuning, move .vocab.pt from ZINC dataset folder to PROTAC dataset folder and replace orgin one

fine-tuning with PROTAC data

bash fine-tuning.sh

RL-training in dBET6 case and scoring function was set to PK

bash train_case.sh

generate from RL-training model

bash testing_msearch_case.sh

then find generation and log file in dBET6 case folder

Reference

Please cite the following paper if you use this code in your work.

@article{zheng2022accelerated,
  title={Accelerated rational PROTAC design via deep learning and molecular simulations},
  author={Zheng, Shuangjia and Tan, Youhai and Wang, Zhenyu and Li, Chengtao and Zhang, Zhiqing and Sang, Xu and Chen, Hongming and Yang, Yuedong},
  journal={Nature Machine Intelligence},
  pages={1--10},
  year={2022},
  publisher={Nature Publishing Group}
}

Contact

@Shuangjia

protac-rl's People

Contributors

prokia avatar tanyoh avatar gwijewar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.