qizhipei / fabind Goto Github PK

View Code? Open in Web Editor NEW

76.0 3.0 10.0 711 KB

FABind: Fast and Accurate Protein-Ligand Binding (NeurIPS 2023)

Home Page: https://arxiv.org/abs/2310.06763

License: MIT License

Python 100.00%

binding computational-biology docking machine-learning bioinformatics

fabind's Introduction

Qizhi Pei

🔭 I’m Qizhi Pei, a second-year PhD student from Gaoling School of Artificial Intelligence, RUC.
🌱 I’m currently doing research about AI4Science, especially for 3D biomolecular modeling and multi-modal learning on biomolecule.
📫 How to reach me:
- Website: https://qizhipei.github.io
- Email: [email protected]

fabind's People

Contributors

Stargazers

Watchers

Forkers

jinzhuwei kygao haomingcs analogiks lindsey98 amelie-iska masterwhook eltociear

fabind's Issues

Does FABind provide some confidence/affinity score?

Hi Qizhi,

Does FABind provide some confidence/affinity score that I use to decide whether a pair of protein-ligand can bind or not?
I have tried setting the confidence score as

pred_index_true = pocket_cls_pred[i][:j].sigmoid().unsqueeze(-1) # pocket predicted probability 
pred_index_false = 1. - pred_index_true
pred_index_prob = torch.cat([pred_index_false, pred_index_true], dim=-1)

pred_index_log_prob = torch.log(pred_index_prob)
pred_index_one_hot = gumbel_softmax_no_random(pred_index_log_prob,
                                                          tau=self.args.gs_tau,
                                                          hard=self.args.gs_hard)
pred_index_one_hot_true = pred_index_one_hot[:, 1].unsqueeze(-1)
pred_confidence_gumbel = pred_index_one_hot_true * pred_index_true
pred_pocket_confidence[i] = pred_confidence_gumbel.sum(dim=0) / pred_index_one_hot_true.sum(dim=0)

However, I find the confidence scores are very high for arbitrary protein-ligand pairs, like > 0.9.
Therefore, I would like to ask if you have a better suggestion.

Segmentation fault

When running evaluation, an error occurs when iterating through the "new_dataset" in the test set at index 17300. The error message is as follows:

No pretrained model

Could you please offer the pretained model？

No CUDA runtime is found

Hi, while running the inference script after installing all the required packages, I encountered this error. I do have an Nvidia GPU and it was working fine for other projects but not this time.

======  preprocess molecules  ======
No CUDA runtime is found, using CUDA_HOME='/usr/bin/nvcc'
======  preprocess proteins  ======

I just saw that fabind_inference.py was updated 2 weeks ago. Is it updated to FABind+ or it is just an update of FABind? Are you going to create a new project for FABind+ or just update the FABind project?

Thank you,
Christian

Keyerror "Complex" when infer examples

Hi,

When I used your code to infer your example pdbs, I got no result. I checked the error message and found the following:

Traceback (most recent call last):
File "fabind_inference.py", line 372, in
post_optim_mol(args, accelerator, data, com_coord_pred, com_coord_pred_per_sample_list, com_coord_per_sample_list, compound_batch, LAS_tmp=LAS_tmp, rigid=args.rigid)
File "fabind_inference.py", line 288, in post_optim_mol
com_coord_i = data[i]['compound'].rdkit_coords
File "/home/jiayinjun/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 177, in getitem
return self.get_example(idx) # type: ignore
File "/home/jiayinjun/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 124, in get_example
data = separate(
File "/home/jiayinjun/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/separate.py", line 35, in separate
attrs = slice_dict[key].keys()
KeyError: 'complex'

Could please you help me with this error? Thank you very much!

Alternatives to SMILES

How can I start an inference using a pdb for the protien, and a 'mol2' or 'sdf' file for the ligand?

Binding scores?

Hi,

I was able to run your examples w/o problems. However, I did not find a file that contains the binding scores. Is FABind generating only poses and then I need to use a rescoring tool such as BR-Nib to sort the binding poses?

Thanks for you help,
Christian

not all ligands submitted get through

Processing multiple smiles on one target

Hi,

This is not an issue per se, more a question and suggestions.

in your ex, the structure of the file is:
Cleaned_SMILES,pdb_id
CCC@H C@HC(=O)NC@@HC(=O)NC@@HC(=O)NC@HC(C)C,6efk
CC(C)CCN1c2nc(Nc3cc(F)c(O)c(F)c3)ncc2N(C)C(=O)C1(C)C,6g3c
CC(C)(COP@@(O)OP@(O)OC[C@H]1OC@@H C@H[C@@h]1OP(=O)(O)O)C@@HC(=O)NCCC(=O)NCCO,6n93
O=C(O)c1ccccc1-n1cccc1,6npi

and the output are sd files with a name composed of pdb_id + a number. If I were to generate million of poses on one receptor... so on a virtual screen setup, it would be more practical that instead of the protein_id, it would be a compound id. It is just a suggestion for an eventual update if you do not have already a screening mode. If cmpd_id would be in the header than instead of appending a number to the name of the sd file, it would be the cmpd_id.

The poses generated are saved in sdf, can they be saved in mol2 files? It is not a big problem as I can convert them but converting multi-million files takes some time so having them directly in mol2 files would save some time. as most of the rescoring tools accept mol2 files.

Best,
Christian

Reproduciblity of Results with Inference Mode

Hi,

I'm quite impressed with the concept presented in your work; it has the potential to save considerable time. I attempted to replicate your model in inference mode, as described in README, but encountered discrepancies in both RMSD scores and visualizations compared to the reported results.

Here are the RMSD scores I obtained:

6g3c rmsd: 13.472411671189839
6npi rmsd: 13.028547491615917

Additionally, I observed differences in the docking visualizations between my replication attempts in inference mode and the results reported in your paper:

PDB ID: 6G3C (Cyan=FABind, Yellow=Ground Truth, Purple=Diffdock)

Reproduced with Inference	Reported in Paper

PDB ID: 6NPI (Cyan=FABind, Yellow=Ground Truth, Purple=Diffdock)

Reproduced with Inference	Reported in Paper

Could you kindly offer any advice or insights on how to replicate your published results accurately?

Best regards,
Ahmet

No Results Written to uid_smiles_sdfname.csv After Running Inference on Custom Complexes

After following the instructions provided in the README to run inference on custom complexes, I noticed that the uid_smiles_sdfname.csv file was created, but no results were written into it.

I suspect the issue may be related to an error encountered during the execution of post_optim_mol at line 371 in fabind_inference.py. To further investigate the problem, I removed the try block surrounding this part of the code. This action led to the generation of the following error message:

Traceback (most recent call last):
File "fabind_inference.py", line 382, in
post_optim_mol(args, accelerator, data, com_coord_pred, com_coord_pred_per_sample_list, com_coord_per_sample_list, compound_batch, LAS_tmp=LAS_tmp, rigid=args.rigid)
File "fabind_inference.py", line 288, in post_optim_mol
com_coord_i = data[i]['compound'].rdkit_coords
File "/home/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 177, in getitem
return self.get_example(idx) # type: ignore
File "/home/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/batch.py", line 124, in get_example
data = separate(
File "/home/miniconda3/envs/fabind/lib/python3.8/site-packages/torch_geometric/data/separate.py", line 35, in separate
attrs = slice_dict[key].keys()
KeyError: 'complex'

It seems the error might be preventing the successful writing of results to the uid_smiles_sdfname.csv file. Any assistance in resolving this issue and enabling the proper output to the file would be greatly appreciated.