FlexPose

FlexPose, a framework for AI-based flexible modeling of protein-ligand binding pose.

A free light-weight web server can be found here.

Table of contents

Installation
Usage
License
Citation

Installation

Install prerequisite packages

FlexPose is implemented in PyTorch. All basic dependencies are listed in requirements.txt and most of them can be easily installed with pip install. We provide tested installation commands in install_cmd.txt for your reference.

Install FlexPose pacakge

pip install -e .

Usage

Prediction

You can use the FlexPose as follows in demo.py:

from FlexPose.utils.prediction import predict as predict_by_FlexPose

predict_by_FlexPose(
    protein='./FlexPose/example/4r6e/4r6e_protein.pdb',               # a protein path, or a list of paths
    ligand='./FlexPose/example/4r6e/4r6e_ligand.mol2',                # a ligand path (or SMILES), or a list of paths (or SMILES)
    ref_pocket_center='./FlexPose/example/4r6e/4r6e_ligand.mol2',     # a ligand-like file for selecting pocket, e.g. predictions from Fpocket
    # batch_csv='./FlexPose/example/example_input.csv',               # for batch prediction

    device='cuda:0',                                                  # device
    structure_output_path='./structure_output',                       # structure output
    output_result_path='./output.csv',                                # record output
)

Arguments	Descriptions
`protein`	Input proteins (a list of paths)
`ligand`	Input ligands (a list of paths)
`ref_pocket_center`	Ligand-like files for pocket selection (a list of paths)
`batch_csv`	Batch prediction
`ens`	Ensemble number
`structure_output_path`	A folder for saving predicted structures
`output_result_path`	A csv file for saving records
`min`	Energy minimizion
`min_loop`	Energy minimizion loops
`min_constraint`	Constraint energy minimizion constant (kcal/mol/Å^2)
`model_conf`	Output model confidence
`device`	Device
`batch_size`	Batch size
`prepare_data_with_multi_cpu`	Prepare inputs with multiprocessing

Training

Here, we provide a pipeline for training a model on the PDBbind and APObind datasets, and it is recommended to run these scripts in the root directory of FlexPose.

Data augmentation (Optional)

We use Rosetta to generate fake apo conformations from holo conformations. For each training iteration, there is a small probability that the model is trained with these fake conformations.

python FlexPose/preprocess/aug_pseudo_apo.py \
--apobind_path path/to/apobind \
--pdbbind_path path/to/pdbbind \
--save_path path/for/saving \
--n_rand_pert 3 \
--n_fixbb_repack 3 \
--n_flexbb_repack 3

You need to set --apobind_path and --pdbbind_path to path of the decompressed APObind and PDBbind, and set the --save_path to a folder to save data augmentation.

NOTE: Generating all conformations takes hours to days (depending on the number of CPU cores used). We recommend performing the data augmentation on computers with multiple CPU cores. Alternatively, you can set --n_rand_pert, --n_fixbb_repack and --n_flexbb_repack to 0 to skip most of the processing.

Data preprocessing (Optional)

After data augmentation, now we can generate input files for training:

python FlexPose/preprocess/prepare_APOPDBbind.py \
--apobind_path path/to/apobind \
--pdbbind_path path/to/pdbbind \
--save_path path/for/saving \
--apo_info_path path/to/apobind_all.csv \
--aff_info_path path/to/INDEX_general_PL_data.{year} \
--aug_path path/to/data/augmentation \
--tmp_path ./tmp \
--max_len_pocket 150 \
--max_len_ligand 150

You need to set --apobind_path and --pdbbind_path to path of the decompressed APObind and PDBbind (same settings as in the data augmentation), and set the --save_path to a new folder to save preprocessed data. --apo_info_path is the path to apobind_all.csv, which is provided by APObind. --aff_info_path is the path to INDEX_general_PL_data.{year}, which is provided by PDBbind.

NOTE: Set --max_len_pocket and --max_len_ligand to a small number (e.g. 64) to get a toy dataset, which can speed up training.

Train your own model

If you want to skip data augmentation and data preprocessing, the preprocessed data can be found here. Now, we can train a toy FlexPose by running:

python FlexPose/train/train_APOPDBbind.py \
--data_path path/to/preprocessed/data \
--data_list_path path/to/data/split \
--batch_size 3 \
--lr 0.0005 \
--n_epoch 200 \
--dropout 0.1 \
--use_pretrain False \
--c_x_sca_hidden 32 \
--c_edge_sca_hidden 16 \
--c_x_vec_hidden 16 \
--c_edge_vec_hidden 8 \
--n_head 2 \
--c_block 2 \
--c_block_only_coor 1

You need to set the --data_path to the preprocessed data and set the --data_list_path to a path for saving splited data IDs.

Besides, you can set --use_pretrain to True to use pre-trained encoders, and set (--pretrain_protein_encoder, --pretrain_ligand_encoder) to the path of pre-trained parameters, respectively (or set them to None to load our pre-trained encoders). We freeze pre-trained parameters by default to improve training efficiency.

Model confidence visualization

You can visualize model confidence with PyMol:

spectrum b, red_white_green, minimum=0, maximum=1

License

Released under the MIT license.

Citation

If you find our model useful in your research, please cite the relevant paper:

@article{dong2023equivariant,
  title={Equivariant Flexible Modeling of the Protein--Ligand Binding Pose with Geometric Deep Learning},
  author={Dong, Tiejun and Yang, Ziduo and Zhou, Jun and Chen, Calvin Yu-Chian},
  journal={Journal of Chemical Theory and Computation},
  year={2023},
  publisher={ACS Publications}
}

tiejundong / flexpose Goto Github PK

flexpose's Introduction

FlexPose

FlexPose, a framework for AI-based flexible modeling of protein-ligand binding pose.

Installation

Install prerequisite packages

Install FlexPose pacakge

Usage

Prediction

Training

Data augmentation (Optional)

Data preprocessing (Optional)

Train your own model

Model confidence visualization

License

Citation

flexpose's People

Contributors

Stargazers

Watchers

Forkers

flexpose's Issues

Recommend Projects

Recommend Topics

Recommend Org