schrojunzhang / karmadock Goto Github PK

View Code? Open in Web Editor NEW

79.0 79.0 11.0 33.91 MB

Home Page: https://www.nature.com/articles/s43588-023-00511-5

License: MIT License

Python 100.00%

karmadock's People

Contributors

Stargazers

Watchers

Forkers

hejunhong1107 allenwang233 truatpasteurdotfr yanbosmu haomingcs type59pro lindsey98 byun-jinyoung dfwlab

karmadock's Issues

Why not use torch.scatter_reduce_ instead of the third-party pytorch_scatter operator

https://github.com/rusty1s/pytorch_scatter

https://pytorch.org/docs/stable/generated/torch.Tensor.scatter_reduce_.html#torch.Tensor.scatter_reduce_

Molecule does not have explicit Hs. Consider calling AddHs()

During my use of karmadock, there is a prompt: Molecule does not have explicit Hs. Consider calling AddHs(), what is the meanning of this prompt? Does it affect the result? If it affects the result, what should be done to fix it?

是否可以进行多配体对接？

请问作者这个是否可以进行多配体对接？
是否可以简化整合运行流程，仅输入对接文件，即可输出结果？

Steric clash in docking result file

Hi Xujun,

I run karmadock with this receptor
receptor.pdb.txt, ligand ligand.smi.txt, and crystal ligand Crystal_lig.mol2.txt. Many output structure have steric clash with the receptor file, e.g. CP1_pred_align_corrected.sdf.txt or CP6_pred_align_corrected.sdf.txt.

I have use the yaml file in test.zip to build environment.

Where do you thing is the issue?

Thanks!

Hien

蛋白质口袋未知，如何进行虚拟筛选

感谢您的出色工作。我想尝试使用 KarmaDock 进行虚拟筛选，只是有个问题：

没有配体文件，可以使用demo进行蛋白质口袋预测，并用于demo2进行虚拟筛选吗

screening results poses

Hi,

When I run the ex using pkd1, I get 3 poses per ligand.. ex:

BDB2695_pred_uncorrected.sdf
BDB2695_pred_ff_corrected.sdf
BDB2695_pred_align_corrected.sdf

Which one should be kept as the final pose? If there is only one of the 3 to consider, is there an argument that will prevent the other 2 to be saved? It would be more practical if only one pose would be saved, in the context of multi-million files are screened. A script could always get rid of the other 2. However, this would be done once the screen is completed so the size of the folder would already be huge.

Thanks for your help,
Christian

Using KarmaDock for virtual screening

Hi Xujun,

Thanks for the great work. I want to try using KarmaDock for my virtual screening, Just has a few questions:

For virtual screening with 1 protein and a compound library, should I use the method listed in Demo1 or Demo2? My guess is Demo2 but just want to confirm.
For virtual_screening.py, we don't have any crystal ligand files, but we have an idea of where the binding pocket is. Is is possible to specify the binding pocket coordinates with out the need of --crystal_ligand_file?

Thanks,

Hien

Error on Residues Distances

In source codes: obtain_edge, if I'm right this function should return the distance min and max between residues in atomic levels, thus it should be ... dist.max()*0.1 instead of ... dst.max()*0.1.

Functionality for arbitrary protein-ligand pair docking

The current codebase appears well-suited for the processing of protein-ligand complexes, but its multi-step processing nature may present accessibility challenges.

Introducing the capability to perform docking directly from pdb and sdf files without the need for extensive preprocessing, for example, would be a valuable enhancement. This would broaden its utility and potential for wider usage and benchmarking.

运行ligand_docking.py文件时，实例化模型前后测试集中数据为什么发生了变化！！！

Dependencies versions

Dear Sir,

I hope you are doing well

Could you please specify the version of the dependencies of KarmaDock?
I tried to install KarmaDock on Colab (GPU) but at this step conda env create -f karmadock_env.yaml I got an error that I couldn't solve :
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.

I installed these versions:
PyTorch: 2.0.0.post2
pyg-2.3.1
rdkit=2022.09.1
MDAnalysis[analysis]==2.4.0
ProDy-2.4.1

如何产生一个pkl文件？

目前已知给出了一个预训练好的模型参数，在模型文件夹trained_models/karmadock_screening.pkl下的文件，那么如何产生这个文件？按照自己的理解对参数进行微调和修改？
我通过读utils/fns.py文件，了解到有在124行的save_model方法，并在下面128行的step方法里面得到运用。在包括utils/virtual_screening.py和utils/ligand_docking.py文件内，也只看到了load相关方法的使用，想了解一下该如何复现产生trained_models/karmadock_screening.pkl模型参数文件。
之前我尝试写过这样的代码，感觉效果不理想，而且如分数参数是我随便写的，并不清楚如何设置比较合理：

import argparse
import os
import sys

from tqdm import tqdm
project_dir = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
sys.path.append(project_dir)

from utils.fns import Early_stopper, set_random_seed
from architecture.KarmaDock_architecture import KarmaDock
import torch.nn as nn
import torch.optim

# device
device_id = 0
if torch.cuda.is_available():
    my_device = f'cuda:{device_id}'
else:
    my_device = 'cpu'

model = KarmaDock()
model = nn.DataParallel(model, device_ids=[device_id], output_device=device_id)
model.to(my_device)

# stoper
model_file = f'{project_dir}/trained_models/try_model.pkl'
stopper = Early_stopper(model_file=model_file,
                        mode='lower', patience=10)
stopper.step(50,model)

the meaning of score?

Thanks for your works, could you tell me what is the meaning of score? these socres comes from the result of docking. are they IC50, Kd, ki or all of them?

Issues related to generate graphs

Hello, I would like to ask :3. In the example in Generate graphs based on protein-ligand complexes, an error 1a30 error occurs, and then the graph file is not generated. How to solve it? Thank you！

IndexError: list index out of range

Traceback (most recent call last):
File "/home/L/miniconda3/envs/karmadock/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/L/miniconda3/envs/karmadock/lib/python3.9/site-packages/prefetch_generator/init.py", line 80, in run
for item in self.generator:
File "/home/L/miniconda3/envs/karmadock/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/home/L/miniconda3/envs/karmadock/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 721, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/L/miniconda3/envs/karmadock/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/L/KarmaDock/dataset/dataloader_obj.py", line 31, in call
elem = batch[0]
IndexError: list index out of range

Does anyone know how to fix this, thanks

Meanings of "align_corrected", "ff_corrected" and "uncorrected"

I apologize, but I do not have permission to access your paper, and the meanings of "pred_align_corrected," "pred_ff_corrected," and "pred_uncorrected" poses generated for each molecule are not entirely clear to me.

Based on the results I have reproduced from PDBBind, it appears that "pred_uncorrected" has the highest accuracy and success rate. Does this imply that the "pred_uncorrected" pose is the best, and can I only consider the "pred_uncorrected" pose for my subsequent analysis?

Peptide docking

Can it perform protein and peptide docking?

mdn_score_pred changes across different trails

Hi Xujun,
I find the mdn_score_pred changes after I run the same code again, even if I have set the random seed.
Did you observe something similar? I am wondering what could be the cause for this randomness?

Does the karmadock code in the repository work on windows 10?

hi, @schrojunzhang
Thanks for providing such interesting work, well worth looking into!
Does the karmadock code in the repository work on windows 10? Has any testing been done on it?

many thanks,

Sh-Y

Subject: Performance Issues with Virtual Screening Algorithm

Dear Xujun,

I wanted to express my gratitude for the significant contributions your team has made to the development of virtual screening algorithms. Congratulations on the achievements you have earned in this field.

I am writing to seek your assistance with a performance issue I have encountered while attempting to use the program. It appears that the program is running at a relatively slow pace, with Demo2 taking approximately 20 minutes to complete. Notably, I have observed low CPU, GPU, and RAM utilization during the execution, with occasional SSD write activities being the primary bottleneck.

Here are the details of my system configuration:

CPU: Intel Core i7-13700K
GPU: NVIDIA GeForce RTX 4070
Operating System: Windows Subsystem for Linux 2 (WSL2)

I am curious whether there might be any software configurations or settings that could be causing this performance problem. I have explored various avenues to optimize the execution but have yet to find a satisfactory solution.

Your expertise in this field would be greatly appreciated, and any insights or guidance you could provide to help address this performance issue would be invaluable.

Thank you for your time and consideration. I look forward to your response.

Sincerely,
Xinci Shang

Model weights trained based on time-split PDBBind dataset division

Dear Zhang:
Thank you for sharing the excellent work for molecular docking. I am training a similar model based on PDBBind with time-split scheme. I want to choose KarmaDock as a baseline. May I ask for the model weights trained with time-split scheme which is mentioned in the article?
Best Regards!
Yu JunLin

pocket为空

--crystal_ligand_file用的是您给的，预测出来pocket是空文件

Typos & Request

Doc about total degree, correct ... dim=5 to ... dim=8 or simply remove it.
pos_ture, please search on this phase, it should be pos_true, right?

Additionally, can the training scripts be released? Thanks!

模型权重问题

你好，对你们的对接程序很感兴趣，谢谢你发展了KarmaDock。看你们昨天发布的文章。比较好奇为什么你们不用equibind提供的time split划分做下模型训练来进行公平的比较呢？我清楚time split划分没意义，但是文章里提到不公平比较然后使用tankbind这一点似乎有点牵强了吧。是比不过他们，所以没放数据吗？你们的思路我以前实现过，就是用DeepDock那套打分以及构象优化（构象优化换成过LigPose和Equibind那类），在equibind的test set的对接pose的RMSD并不是很理想（当然，这也取决于搭建的模型）。麻烦请问有time split划分下的模型权重吗？我期待对你们的模型进行更系统地评估对接和筛选能力。感谢你的工作！很棒！