hbioquant / diffbindfr Goto Github PK
View Code? Open in Web Editor NEWDiffusion model based protein-ligand flexible docking method
License: BSD 3-Clause Clear License
Diffusion model based protein-ligand flexible docking method
License: BSD 3-Clause Clear License
Hi!
At that time, I tried to execute "relax the structure" and encountered the following issues.
python pl_1.py ../app/test/reverse/structures -nb 12 -v
INFO: Pandarallel will run on 12 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
0.00% | 0 / 1 |
0.00% | 0 / 1 |
0.00% | 0 / 1 |
0.00% | 0 / 1 |
0.00% | 0 / 1 |
0.00% | 0 / 1 | multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/site-packages/pandarallel/core.py", line 95, in __call__
result = self.work_function(
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/site-packages/pandarallel/data_types/dataframe.py", line 32, in work
return data.apply(
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/site-packages/pandas/core/frame.py", line 10374, in apply
return op.apply().__finalize__(self, method="apply")
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/site-packages/pandas/core/apply.py", line 916, in apply
return self.apply_standard()
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/site-packages/pandas/core/apply.py", line 1063, in apply_standard
results, res_index = self.apply_series_generator()
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/site-packages/pandas/core/apply.py", line 1081, in apply_series_generator
results[i] = self.func(v, *self.args, **self.kwargs)
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/site-packages/pandarallel/progress_bars.py", line 214, in closure
return user_defined_function(
File "/home/XXX/DiffBindFR/DiffBindFR/relax/pl_1.py", line 695, in process
relax_pl(
File "/home/XXX/DiffBindFR/DiffBindFR/relax/pl_1.py", line 527, in relax_pl
modeller.add(lig_top.to_openmm(), positions_with_units)
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/site-packages/openmm/app/modeller.py", line 132, in add
newPositions.append(deepcopy(addPositions[atom.index]))
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/site-packages/openmm/unit/quantity.py", line 773, in append
raise TypeError("Cannot append item without units into list with units")
TypeError: Cannot append item without units into list with units
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/XXX/DiffBindFR/DiffBindFR/relax/pl_1.py", line 764, in <module>
minimizer(
File "/home/XXX/DiffBindFR/DiffBindFR/relax/pl_1.py", line 707, in minimizer
df.parallel_apply(process, axis=1)
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/site-packages/pandarallel/core.py", line 333, in closure
results_promise.get()
File "/home/XXX/mambaforge/envs/dbfr/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
TypeError: Cannot append item without units into list with units
I attempted to make modifications to line 526 of pl.py. But it doesn't work.
positions_with_units = [position * mm_unit.nanometers for position in ligand_mol.conformers[0]]
modeller.add(lig_top.to_openmm(), positions_with_units)
BTW, I would like to ask two questions:
Thank you very much for providing such a great tool!
I tried to evaluate the model on PDBBind dataset, but encounter the KeyError: 'l-rmsd', see below for detail messages. Could you help to look at what could be the problem? I also tried --debug parameters, but it gives the same error.
2024-05-12 17:50:39,324 - Evaluator - INFO - Use benchmark libs: pb
2024-05-12 17:50:39,440 - Evaluator - INFO - Total loaded jobs: 428.
2024-05-12 17:50:39,440 - Evaluator - INFO - Job Slice Info: (0, 428).
2024-05-12 17:50:39,440 - Evaluator - INFO - Running jobs: 428.
2024-05-12 17:50:39,446 - Evaluator - INFO - Start to prepare job (experiment name: posebusters).
Use Background Generator supported dataloader.
2024-05-12 17:50:39,451 - Evaluator - INFO - dock Status: Prep task is Done!
Initializing diffusion model...
2024-05-12 17:50:42,207 - Evaluator - INFO - load checkpoint from local path: /mnt/disk04/haotiant/DiffBindFR/DiffBindFR/weights/diffbindfr_paper.pth
2024-05-12 17:50:42,554 - Evaluator - INFO - Reload model inference output from /mnt/disk04/haotiant/DiffBindFR/pb/export/posebusters/results/model_output.pt
2024-05-12 17:50:42,556 - Evaluator - INFO - Export binding structures....
2024-05-12 17:50:42,556 - Evaluator - INFO - Binding structure export is completed.
2024-05-12 17:50:42,556 - Evaluator - INFO - Start to binding conformation enrichment analysis...
Traceback (most recent call last):
File "/mnt/disk04/haotiant/DiffBindFR/DiffBindFR/evaluation/eval.py", line 276, in
runner(df, args)
File "/mnt/disk04/haotiant/DiffBindFR/DiffBindFR/evaluation/eval.py", line 144, in runner
pd_df, eval_results_df = out_fn(
File "/mnt/disk04/haotiant/DiffBindFR/DiffBindFR/evaluation/eval.py", line 86, in out_fn
ptb = report_enrichment(eval_results_df, show_reports = show_reports)
File "/mnt/disk04/haotiant/DiffBindFR/DiffBindFR/evaluation/reporter.py", line 40, in report_enrichment
l_rmsd = eval_results_df['l-rmsd']
KeyError: 'l-rmsd'
Hi,
when running python DiffBindFR/app/predict.py -h , import issue happend
Traceback (most recent call last):
File "/DiffBindFR/DiffBindFR/app/predict.py", line 12, in
import torch
File "/anaconda3/envs/diffbindfr_env/lib/python3.9/site-packages/torch/init.py", line 218, in
from torch._C import * # noqa: F403
ImportError: /anaconda3/envs/diffbindfr_env/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent
it looks like some link missing about libtorch_cpu.so with from “ldd -r /anaconda3/envs/diffbindfr_env/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so”.
could you give some suggestions how to fix this. Thanks in advance.
Dear Sir,
as in the documentation, I tries to repeat posebusters evaluations
python DiffBindFR/evaluation/eval.py -d /home/rtakahashi/software/DiffDock/data/posebusters_benchmark_set -o /home/rtakahashi/Work/ -j dock -n posebusters -lb pb -np 40 -gpu 0 -cpu 8 -bs 16 -eval -rp
It seems this execution is fine, but at the end on analysis
Traceback (most recent call last):
File "/home/rtakahashi/software/DiffBindFR/DiffBindFR/evaluation/eval.py", line 277, in <module>
runner(df, args)
File "/home/rtakahashi/software/DiffBindFR/DiffBindFR/evaluation/eval.py", line 162, in runner
pd_df['smina_score'] = pd_df['docked_lig'].parallel_apply(
File "/home/rtakahashi/mambaforge/envs/diffbindfr/lib/python3.9/site-packages/pandarallel/core.py", line 324, in closure
return wrapped_reduce_function(
File "/home/rtakahashi/mambaforge/envs/diffbindfr/lib/python3.9/site-packages/pandarallel/core.py", line 199, in closure
return reduce_function(dfs, extra)
File "/home/rtakahashi/mambaforge/envs/diffbindfr/lib/python3.9/site-packages/pandarallel/data_types/series.py", line 34, in reduce
return pd.concat(datas, copy=False)
File "/home/rtakahashi/mambaforge/envs/diffbindfr/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 285, in concat
op = _Concatenator(
File "/home/rtakahashi/mambaforge/envs/diffbindfr/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 339, in __init__
objs = list(objs)
File "/home/rtakahashi/mambaforge/envs/diffbindfr/lib/python3.9/site-packages/pandarallel/core.py", line 195, in <genexpr>
get_dataframe_and_delete_file(output_file_path)
File "/home/rtakahashi/mambaforge/envs/diffbindfr/lib/python3.9/site-packages/pandarallel/core.py", line 189, in get_dataframe_and_delete_file
data = pickle.load(file_descriptor)
EOFError: Ran out of input
This Dataframe error is usually just index is in the column ... etc. But, I did not figure out quickly. If you give a hit to solve this error, it will be great.
Many thanks,
hi,when running the example of reverse docking (python predict.py -l ./reverse/ligand_1.sdf ./reverse/ligand_2.sdf -p ./reverse/receptors -o ./test -np 40 -gpu 0 -cpu 16 -bs 16 -n reverse
), I met an error: File "/DiffBindFR/druglib/datasets/lmdbdataset.py", line 67, in __getitem__raise ValueError(f'query index {idx.decode()} not in lmdb.')
ValueError: query index 2src_protein not in lmdb.
Hi,
Is it possible to call pl.py just before scoring such as through an argument of predict.py? If I call it after predict.py is done with scoring, the score would not reflect exactly a relaxed poses.
Best,
Christian
Hi,
Just to let you know, this part in env.yaml did not work for me:
pip:
I ended up manually installing them like this:
pip install torch-scatter==2.1.0 -f https://pytorch-geometric.com/whl/torch-1.13.1+cu117.html
I also got this message when trying to run:
$ python DiffBindFR/app/predict.py -h
/home/christian/anaconda3/envs/diffbindfr/lib/python3.9/site-packages/pandas/core/computation/expressions.py:21: UserWarning: Pandas requires version '2.8.4' or newer of 'numexpr' (version '2.7.3' currently installed).
from pandas.core.computation.check import NUMEXPR_INSTALLED
I solved this by:
pip install numexpr --upgrade
it installed: numexpr 2.9.0
I don't know yet if this version will have some side effects.
So in theory I solved the problems... this issue can be closed.
Best
Hi,
I reinstalled DiffBindDock on a new PC using env.yaml. Nothing in this section was installed:
pip:
So I manually installed them:
pip install torch-cluster==1.6.0+pt113cu117 torch-scatter==2.1.0+pt113cu117 torch-sparse==0.6.16+pt113cu117 torch-spline-conv==1.2.1+pt113cu117 -f https://pytorch-geometric.com/whl/torch-1.13.1+cu117.html
pip install torch-geometric==2.2.0
installing torch-geometric also installed: scikit-learn==1.4.1 and changed joblib to 1.3.2. I could not revert joblib to 1.1.0 as it is not compatible with this version of torch-geometric. One thing I did not do is to downgrade torchmetrics to 0.11.0 before installing torch-geometric, I left it to version 1.3.2.
Torch looks of with cuda:
(diffbindfr) christian@Linux00:/media/christian/VS1/VS/VS_DiffBindFR$ python
Python 3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:33:10)
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import torch
torch.version
'1.13.1'
torch.cuda.is_available()
True
torch.version.cuda
'11.7'
exit ()
This works also fine after I reinstalled torch-geometric:
python DiffBindFR/app/predict.py -h
However, I get these errors trying to run one of the command in the read.me file:
I do not know if all those errors are coming from the problem with numpy version. However, I cannot downgrade numpy below 1.24.0 as the min version for different packages is between 1.20 and 1.24.
(diffbindfr) christian@Linux00:/media/christian/VS1/VS/VS_DiffBindFR/DiffBindFR/app$ python predict.py -l ../../examples/forward/mols -p ../../examples/forward/3dbs_protein.pdb -o ./test -np 40 -gpu 0 -cpu 6 -bs 6 -n forward
/home/christian/anaconda3/envs/diffbindfr/lib/python3.9/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
warnings.warn(
| _ (_)/ |/ | __ )() __ __| | | _ \
| | | | | || || _ | | ' \ / _| |_ | |_) | | |_| | | _| _| |_) | | | | | (_| | _| | _ < |____/|_|_| |_| |____/|_|_| |_|\__,_|_| |_| \_\ 2024-03-21 00:15:05,051 - DiffBindFR - INFO - Total loaded jobs: 15. 2024-03-21 00:15:05,051 - DiffBindFR - INFO - Job Slice Info: (0, 15). 2024-03-21 00:15:05,051 - DiffBindFR - INFO - Running jobs: 15. 2024-03-21 00:15:05,054 - DiffBindFR - INFO - Start to prepare job (experiment name: forward). INFO: Pandarallel will run on 6 workers. INFO: Pandarallel will use Memory file system to transfer data between the main process and workers. 2024/03/21 00:15:05 - InferenceDataset - **ERROR - module 'numpy' has no attribute 'int'.**
np.intwas a deprecated alias for the builtin
int. To avoid this error in existing code, use
intby itself. Doing this will not modify any behavior and is safe. When replacing
np.int, you may wish to use e.g.
np.int64or
np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations from /media/christian/VS1/VS/VS_DiffBindFR/examples/forward/3dbs_protein.pdb from crystal ligand /media/christian/VS1/VS/VS_DiffBindFR/DiffBindFR/app/../../examples/forward/3dbs_protein_crystal.sdf
Use Background Generator supported dataloader.
2024-03-21 00:15:05,606 - DiffBindFR - INFO - dock Status: Prep task is Done!
Initializing diffusion model...
2024-03-21 00:15:07,631 - DiffBindFR - INFO - load checkpoint from local path: /media/christian/VS1/VS/VS_DiffBindFR/DiffBindFR/weights/diffbindfr_paper.pth
2024-03-21 00:15:08,466 - DiffBindFR - INFO - Running model inference...
[ ] 0/560, elapsed: 0s, ETATraceback (most recent call last):
File "/media/christian/VS1/VS/VS_DiffBindFR/DiffBindFR/app/predict.py", line 265, in
sys.exit(main())
File "/media/christian/VS1/VS/VS_DiffBindFR/DiffBindFR/app/predict.py", line 260, in main
runner(df, args)
File "/media/christian/VS1/VS/VS_DiffBindFR/DiffBindFR/app/predict.py", line 131, in runner
pairs_results = common.inferencer(
File "/media/christian/VS1/VS/VS_DiffBindFR/DiffBindFR/common/engines.py", line 204, in inferencer
model_out = model_run(dl, model, show_traj)
File "/media/christian/VS1/VS/VS_DiffBindFR/DiffBindFR/common/engines.py", line 174, in model_run
for idx, data in enumerate(dl):
File "/home/christian/anaconda3/envs/diffbindfr/lib/python3.9/site-packages/prefetch_generator/init.py", line 116, in next
raise next_item
File "/home/christian/anaconda3/envs/diffbindfr/lib/python3.9/site-packages/prefetch_generator/init.py", line 98, in run
for item in self.generator: self.queue.put((True , item))
File "/home/christian/anaconda3/envs/diffbindfr/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/home/christian/anaconda3/envs/diffbindfr/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
return self._process_data(data)
File "/home/christian/anaconda3/envs/diffbindfr/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
data.reraise()
File "/home/christian/anaconda3/envs/diffbindfr/lib/python3.9/site-packages/torch/_utils.py", line 543, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/christian/anaconda3/envs/diffbindfr/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/christian/anaconda3/envs/diffbindfr/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/christian/anaconda3/envs/diffbindfr/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/media/christian/VS1/VS/VS_DiffBindFR/druglib/datasets/custom_dataset.py", line 263, in getitem
data = self.get(idx)
File "/media/christian/VS1/VS/VS_DiffBindFR/druglib/datasets/custom_dataset.py", line 326, in get
return self._prepare_test_sample(self.indices[idx])
File "/media/christian/VS1/VS/VS_DiffBindFR/DiffBindFR/common/inference_dataset.py", line 583, in _prepare_test_sample
mdl_inp = getattr(self.PairData, obj + 's')[row[name]].model_input
File "/media/christian/VS1/VS/VS_DiffBindFR/druglib/datasets/lmdbdataset.py", line 67, in getitem
raise ValueError(f'query index {idx.decode()} not in lmdb.')
ValueError: query index 3dbs_protein not in lmdb.
Hi, many thanks for the wonderful work.
When I installed from source and run DiffBindFR, error occurs for Segmentation fault (core dumped).
I can't figure out what is the reason. Please give some suggestions. thanks in advance.
23/02/2024
Dear Professor,
I am writing to express my interest in your paper "DiffBindFR: An SE(3) Equivariant Network for Flexible Protein-Ligand Docking" that I read on arXiv. I found your paper very impressive and innovative, as it proposes a novel deep learning paradigm for protein-ligand docking that is invariant to rigid and flexible transformations.
I am a researcher in the field of computational biology and I am working on a project that involves docking large libraries of ligands to various protein targets. I am very curious about how your method performs on different datasets and scenarios, and I would love to test it on my own data.
I understand that your paper has not been published in a peer-reviewed journal yet, and I respect your decision to withhold your code until then. However, I would greatly appreciate it if you could share your code with me once your paper is accepted and published. I promise to use your code only for research purposes and to cite your paper properly.
Please let me know if you are willing to share your code with me and when I can expect to receive it. You can contact me at [email protected].
Thank you for your time and attention. I look forward to hearing from you soon.
Sincerely,
Xu
Dear Sir,
Many thanks for very interesting work and openning your codes to the public. I would like to try our in our systems.
conda env create -f env.yaml
Channels:
- pyg
- pytorch
- nvidia
- conda-forge
- bioconda
- defaults
- omnia
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:
- conda-forge::setuptools==59.5.0
- pytorch::pytorch==1.13.1
- conda-forge::python==3.9.18
- conda-forge::pymol-open-source
- conda-forge::pip
- conda-forge::pdbfixer
- conda-forge::openmmforcefields
- conda-forge::openmm==7.7
- conda-forge::openbabel
- conda-forge::mpi4py
- conda-forge::cudatoolkit==11.7
- conda-forge::ambertools
Current channels:
- https://conda.anaconda.org/pyg/linux-64
- https://conda.anaconda.org/pytorch/linux-64
- https://conda.anaconda.org/nvidia/linux-64
- https://conda.anaconda.org/conda-forge/linux-64
- https://conda.anaconda.org/bioconda/linux-64
- https://repo.anaconda.com/pkgs/main/linux-64
- https://repo.anaconda.com/pkgs/r/linux-64
- https://conda.anaconda.org/omnia/linux-64
- https://conda.anaconda.org/conda-forge
- https://conda.anaconda.org/conda-forge
- https://conda.anaconda.org/conda-forge
- https://conda.anaconda.org/conda-forge
- https://conda.anaconda.org/conda-forge
- https://conda.anaconda.org/conda-forge
- https://conda.anaconda.org/conda-forge
- https://conda.anaconda.org/conda-forge
- https://conda.anaconda.org/conda-forge
- https://conda.anaconda.org/conda-forge
- https://conda.anaconda.org/conda-forge
- https://conda.anaconda.org/pytorch
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
As following errors, when I took out conda-forge from env.yaml, These errors were gone, but still there is a conflict of biopython and prody versions as below.
INFO: pip is looking at multiple versions of prody to determine which version is compatible with other requirements. This could take a while.
The conflict is caused by:
The user requested biopython==1.80
prody 2.4.0 depends on biopython<=1.79
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
Pip subprocess error:
ERROR: Cannot install -r /home/rtakahashi/software/DiffBindFR/condaenv.dqt9m8py.requirements.txt (line 18) and biopython==1.80 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
failed
CondaEnvException: Pip failed
Furthermore,
https://github.com/HBioquant/DiffBindFR/blob/main/DiffBindFR/relax/pl.py
https://github.com/HBioquant/DiffBindFR/blob/main/DiffBindFR/app/predict.py
https://github.com/HBioquant/DiffBindFR/blob/main/notebooks/AF2_model_docking.ipynb
These files are missing.
When you get a chance, could you look at these issues?
Many thanks,
Dear Dr Zhu,
I did a few tests. One of my targets has a C2+ ion in the binding pocket that has been shown, in co-crystals, to be important for binding. The receptor models I get from a docking with DiffBindFR seem to be stripped of Hydrogen, water molecules and ions. Probably it is a limitation of this approach. Still, the binding pose of the co-crystallized ligands look very similar than with the co-crystal. However, not having a Ca2+ there could be affecting the scoring of the different poses. I just wanted to double check with you if it were possible to keep the Ca2+.
Anyways, most of my targets do not have ions in the binding pocket so DiffBindFR will still be very useful.
From my tests, I think smina scoring works much better than mdn. I don't think I'm going to rely on mdn.
In term of speed, at 40 poses/ligand, it takes about 2min/ligand. If I increase too much -bs, I get a problem with Pytorch vs available memory on the GPU and if I increase too much -cpu, the script crashes. I found a sweet spot that seem to work well. For screening big libraries, a few seconds here and there make a huge difference at the end. Do you have any suggestions?
Thanks for help,
Christian
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.