Code Monkey home page Code Monkey logo

flexpose's Introduction

FlexPose

FlexPose, a framework for AI-based flexible modeling of protein-ligand binding pose.

Fig1_b

A free light-weight web server can be found here.

Table of contents

Installation

Install prerequisite packages

FlexPose is implemented in PyTorch. All basic dependencies are listed in requirements.txt and most of them can be easily installed with pip install. We provide tested installation commands in install_cmd.txt for your reference.

Install FlexPose pacakge

pip install -e .

Usage

Prediction

You can use the FlexPose as follows in demo.py:

from FlexPose.utils.prediction import predict as predict_by_FlexPose

predict_by_FlexPose(
    protein='./FlexPose/example/4r6e/4r6e_protein.pdb',               # a protein path, or a list of paths
    ligand='./FlexPose/example/4r6e/4r6e_ligand.mol2',                # a ligand path (or SMILES), or a list of paths (or SMILES)
    ref_pocket_center='./FlexPose/example/4r6e/4r6e_ligand.mol2',     # a ligand-like file for selecting pocket, e.g. predictions from Fpocket
    # batch_csv='./FlexPose/example/example_input.csv',               # for batch prediction

    device='cuda:0',                                                  # device
    structure_output_path='./structure_output',                       # structure output
    output_result_path='./output.csv',                                # record output
)
Arguments Descriptions
protein Input proteins (a list of paths)
ligand Input ligands (a list of paths)
ref_pocket_center Ligand-like files for pocket selection (a list of paths)
batch_csv Batch prediction
ens Ensemble number
structure_output_path A folder for saving predicted structures
output_result_path A csv file for saving records
min Energy minimizion
min_loop Energy minimizion loops
min_constraint Constraint energy minimizion constant (kcal/mol/ร…^2)
model_conf Output model confidence
device Device
batch_size Batch size
prepare_data_with_multi_cpu Prepare inputs with multiprocessing

Training

Here, we provide a pipeline for training a model on the PDBbind and APObind datasets, and it is recommended to run these scripts in the root directory of FlexPose.

Data augmentation (Optional)

We use Rosetta to generate fake apo conformations from holo conformations. For each training iteration, there is a small probability that the model is trained with these fake conformations.

python FlexPose/preprocess/aug_pseudo_apo.py \
--apobind_path path/to/apobind \
--pdbbind_path path/to/pdbbind \
--save_path path/for/saving \
--n_rand_pert 3 \
--n_fixbb_repack 3 \
--n_flexbb_repack 3

You need to set --apobind_path and --pdbbind_path to path of the decompressed APObind and PDBbind, and set the --save_path to a folder to save data augmentation.

NOTE: Generating all conformations takes hours to days (depending on the number of CPU cores used). We recommend performing the data augmentation on computers with multiple CPU cores. Alternatively, you can set --n_rand_pert, --n_fixbb_repack and --n_flexbb_repack to 0 to skip most of the processing.

Data preprocessing (Optional)

After data augmentation, now we can generate input files for training:

python FlexPose/preprocess/prepare_APOPDBbind.py \
--apobind_path path/to/apobind \
--pdbbind_path path/to/pdbbind \
--save_path path/for/saving \
--apo_info_path path/to/apobind_all.csv \
--aff_info_path path/to/INDEX_general_PL_data.{year} \
--aug_path path/to/data/augmentation \
--tmp_path ./tmp \
--max_len_pocket 150 \
--max_len_ligand 150

You need to set --apobind_path and --pdbbind_path to path of the decompressed APObind and PDBbind (same settings as in the data augmentation), and set the --save_path to a new folder to save preprocessed data. --apo_info_path is the path to apobind_all.csv, which is provided by APObind. --aff_info_path is the path to INDEX_general_PL_data.{year}, which is provided by PDBbind.

NOTE: Set --max_len_pocket and --max_len_ligand to a small number (e.g. 64) to get a toy dataset, which can speed up training.

Train your own model

If you want to skip data augmentation and data preprocessing, the preprocessed data can be found here. Now, we can train a toy FlexPose by running:

python FlexPose/train/train_APOPDBbind.py \
--data_path path/to/preprocessed/data \
--data_list_path path/to/data/split \
--batch_size 3 \
--lr 0.0005 \
--n_epoch 200 \
--dropout 0.1 \
--use_pretrain False \
--c_x_sca_hidden 32 \
--c_edge_sca_hidden 16 \
--c_x_vec_hidden 16 \
--c_edge_vec_hidden 8 \
--n_head 2 \
--c_block 2 \
--c_block_only_coor 1

You need to set the --data_path to the preprocessed data and set the --data_list_path to a path for saving splited data IDs.

Besides, you can set --use_pretrain to True to use pre-trained encoders, and set (--pretrain_protein_encoder, --pretrain_ligand_encoder) to the path of pre-trained parameters, respectively (or set them to None to load our pre-trained encoders). We freeze pre-trained parameters by default to improve training efficiency.

Model confidence visualization

Fig_conf

You can visualize model confidence with PyMol:

spectrum b, red_white_green, minimum=0, maximum=1

License

Released under the MIT license.

Citation

If you find our model useful in your research, please cite the relevant paper:

@article{dong2023equivariant,
  title={Equivariant Flexible Modeling of the Protein--Ligand Binding Pose with Geometric Deep Learning},
  author={Dong, Tiejun and Yang, Ziduo and Zhou, Jun and Chen, Calvin Yu-Chian},
  journal={Journal of Chemical Theory and Computation},
  year={2023},
  publisher={ACS Publications}
}

flexpose's People

Contributors

tiejundong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

flexpose's Issues

errors when running demo.py

Hi,

Thanks for your brilliant work and code! I encountered an error when running demo.py:
image
Could you offer some suggestions?

Bests,

web server error

The web server only works when I use the provided input example but as soon as I use my protein and ligand it shows this error.

"FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpdlk2psln/structure_output_path/0.pdb'
Traceback:
File "/data/lab_website/.conda/envs/streamlit/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 541, in _run_script
exec(code, module.dict)
File "/data/lab_website/lab_website_code/pages/1_FlexPose.py", line 141, in
view.addModel(grep_pdb_rename_chain(f'{structure_output_path}/0.pdb', 'ATOM', 'A'), 'pdb')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/lab_website/lab_website_code/pages/1_FlexPose.py", line 132, in grep_pdb_rename_chain
lines = open(pdb_path, 'r').readlines()
^^^^^^^^^^^^^^^^^^^

About pretraining code

Hi, your work is really good and solid! I'm wondering if you can upload your pretraining scripts for protein and ligand encoders? I think your pretraining tasks are quite interesting and informative. Thank you so much!

Problem to dowload weights and with aromatics/symmetric compounds

Thanks for the work!

I tried to download the weights for local execution, but both at command line
and direct download the speed is slow (< 160 Kb/s) and fails at around 80%.

When trying the web server version, I noticed that aromatics and double-bonded in non-protein compounds tend
often to violate (more or less) co-planar constraints (could be a small setting for minimization in the web server?),
which is not happening in other systems such as DiffDock.
Moreover, big problems occurs when docking symmetric compounds that have multi-aromatics (see the attached example).
Atoms did not stay in place, as it seems that during inference the system cannot distinguish the aromatic ring to which each atom belongs to, and final things result messed up.
Thanks for any support!

Marco

Input_and_server_output.zip

The trained model

Great job!
But we would like to use this method locally. Could you provide the trained model? Thank you very much!

Confused of the training process

Hi Authors,

Thanks for the brilliant work! I am a little confused with the training process, where apo structures, holo-structures, or fake apo structures are randomly selected as input. For flexible binding, is it more intuitive to input apo structures and predict holo-structures in the training process?

Thanks!

About .chk Files Download

Hi Tiejun,
I noticed that the .chk file seems to be uploaded to your personal website, and the download speed is about 150kb/s. Is there any other faster download method? For example, Google Drive or Baidu Cloud Disk.
Thank you.

about run demo.py

I installed Flexpose locally and ran the python demo.py without error, but without any file generation. What could be the reason!
view

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.