octavian-ganea / equidock_public Goto Github PK

EquiDock: geometric deep learning for fast rigid 3D protein-protein docking

License: MIT License

Roff 63.14% Python 36.86%

geometry proteins protein-protein-interaction docking graphneuralnetwork deeplearning equivariant-network equivariance equivariant-representations drugdiscovery drug-discovery

equidock_public's Issues

How to run inference on custom PDB + Problems with Installation

Hi there,

I'd like to report a bug on installation. So far my workaround was to use dgl==0.9.0 rather than the dgl==0.7.0 you have in the requirements.

Also, is there an easy way to interface with the models for a custom set of PDBs? I'd prefer to avoid having to fiddle with the inference_rigid.py but from what it seems there's no way to pass custom sets other than perhaps inserting them as test data?

Matrix Product Error in Kabsh Model

Hi, dear authors of Equidock, I feel really sad to hear the news that Ganea passed away without fully showing his extraordinary genius.

I came across the calculation of the Kabsh Model and found that the computation of the rotation matrix is somehow misleading. To be specific, U, S, Vt = np.linalg.svd(H) gives us the U, S, V^T, which corresponds to U2, S, U1^T in the paper. Next, the rotation matrix is obtained via R = Vt.T @ U.T, which is different from what is described in the text. Instead, R = U2 @ U^T, which should be R = U @ Vt in the code. Do you agree with me?

about preprocess_raw_data.py

When I run the command as follows:

python preprocess_raw_data.py -n_jobs 60 -data dips -graph_nodes residues -graph_cutoff 30 -graph_max_neighbor 10 -graph_residue_loc_is_alphaC -pocket_cutoff 8 -data_fraction 1.0

it can generate six files in the directory /extendplus/jiashan/equidock_public/src/cache/dips_residues_maxneighbor_10_cutoff_30.0_pocketCut_8.0/cv_0
with files

label_test.pkl  ligand_graph_test.bin  receptor_graph_test.bin
label_val.pkl   ligand_graph_val.bin   receptor_graph_val.bin

However, three more files could not be generated successfully, and report errors as follows:

Processing  ./cache/dips_residues_maxneighbor_10_cutoff_30.0_pocketCut_8.0/cv_0/label_frac_1.0_train.pkl
Num of pairs in  train  =  39901
Killed

Could you help me solve this problem?
Thanks!

Why optimal transport matrix is not used?

Hi, thanks for the great work!! I have a question regarding the following point in the paper:

On p.7 it is stated that:

we unfortunately do not know the actual alignment between points in $Y_l$ and $P_l$
, for every $l ∈ {1, 2}$. This can be recovered using an additional optimal transport loss

However, in the code here :
https://github.com/octavian-ganea/equidock_public/blob/main/src/train.py#L128
The optimal transport matrix (the 2nd returned variable) is ignored:

ot_dist, _ = compute_ot_emd(cost_mat_ligand + cost_mat_receptor, args['device'])

In my understanding, the matrix should be used to recovered the alignment.
So I am now confused how the points alignment can be recovered without this optimal transport matrix?

Thank you so much again!

cuda and dgl version

dgl version 0.7.0 is no longer available for installation.(Also POT...)
Can you offer another version with corresponding other library versions?

Installation problems

Thank you for this great tool.
I am starting to install it on Ubuntu 20.04, and met with multiple FileNotFound Errors. Are there any dependencies not listed?
Here are the errors:
(base) nc1@nc1-UA9C-R38:/equidock_public-main$ # Extract the raw PDB files:
(base) nc1@nc1-UA9C-R38:/equidock_public-main$ python3 project/datasets/builder/extract_raw_pdb_gz_archives.py project/datasets/DIPS/raw/pdb
python3: can't open file '/home/nc1/equidock_public-main/project/datasets/builder/extract_raw_pdb_gz_archives.py': [Errno 2] No such file or directory
(base) nc1@nc1-UA9C-R38:/equidock_public-main$
(base) nc1@nc1-UA9C-R38:/equidock_public-main$ # Process the raw PDB data into associated pair files:
(base) nc1@nc1-UA9C-R38:/equidock_public-main$ python3 project/datasets/builder/make_dataset.py project/datasets/DIPS/raw/pdb project/datasets/DIPS/interim --num_cpus 28 --source_type rcsb --bound
python3: can't open file '/home/nc1/equidock_public-main/project/datasets/builder/make_dataset.py': [Errno 2] No such file or directory
(base) nc1@nc1-UA9C-R38:/equidock_public-main$
(base) nc1@nc1-UA9C-R38:/equidock_public-main$ # Apply additional filtering criteria:
(base) nc1@nc1-UA9C-R38:/equidock_public-main$ python3 project/datasets/builder/prune_pairs.py project/datasets/DIPS/interim/pairs project/datasets/DIPS/filters project/datasets/DIPS/interim/pairs-pruned --num_cpus 28
python3: can't open file '/home/nc1/equidock_public-main/project/datasets/builder/prune_pairs.py': [Errno 2] No such file or directory

best validation score & some other variations

Hello!

I was working with your code and found out that the best validation score used in the project (val_complex_rmsd_median) differs from what is enunciated in the article presenting EquiDock (val_ligand_rmsd_median). Is there any reason behind this choice or am I misinterpreting something ?

equidock_public/src/train.py

Line 372 in ac2c754

    
           if val_complex_rmsd_median < best_val_rmsd_median * 0.98: ## We do this to avoid "pure luck"

Dependencies typo ?

python==3.9.10
numpy==1.22.1
cuda==10.1
torch==1.10.2
dgl==0.7.0
biopandas==0.2.8
ot==0.7.0
rdkit==2021.09.4
dgllife==0.2.8
joblib==1.1.0

Shouldn't 'ot' be 'POT' for 'Python Optimal Transport' ?

inference script has no docs

Great work, but I don't see any documentation for the inference script or even an ArgumentParser/etc. I think I can figure it out from the code, but but it would be nice for simple docking inference to use for predictions. Apologies if this information is somewhere else and I missed it.

How to get the complex pose?

Hi,

I am having some issues looking through test_sets_pdb:

original (undocked) structures are in: db5_test_random_transform with ligands (part that moves) being in random_transformed, receptor (not movable) being in complexes and results in db5_equidock_results.

So for example if we take from db5_equidock_results the following pose: 1AVX_l_b_EQUIDOCK.pdb this means that 1AVX_l_b.pdb was used as ligand (movable) and 1AVX_r_b_complex.pdb as receptor (not moved). This means that if I superimposse in pymol 1AVX_l_b_EQUIDOCK.pdb and `1AVX_r_b_complex.pdb' I should get a nicely docked complex, however this is not the case. There are many many clashes.

Can you help?

Best,
Liviu

error when I run preprocess_raw_data.py. How can I fix it?

hello,
when I run the command: python preprocess_raw_data.py -n_jobs 20 -data db5 -graph_nodes residues -graph_cutoff 30 -graph_max_neighbor 10 -graph_residue_loc_is_alphaC -pocket_cutoff 8

I got the following error:

Processing split 1
Processing ./cache/db5_residues_maxneighbor_10_cutoff_30.0_pocketCut_8.0/cv_1\label_val.pkl
Traceback (most recent call last):
File "C:\Users\equidock_public-main\src\preprocess_raw_data.py", line 37, in
Unbound_Bound_Data(args, reload_mode='val', load_from_cache=False, raw_data_path=raw_data_path,
File "C:\Users\equidock_public-main\src\utils\db5_data.py", line 78, in init
with open(os.path.join(split_files_path, reload_mode + '.txt'), 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/benchmark5.5/cv/cv_0\cv_1\val.txt'

Any suggestions?

Thanks

what is the difference between rigid protein docking and protein-protein docking?

after reading your paper, I have a question: what is the difference between rigid protein docking and protein-protein docking? in my understanding, it is almost the same task. is it right?

to speed up rsync

add a -W to ignore check，

rsync -rlpt -v -W -z --delete --port=33444 rsync.rcsb.org::ftp_data/biounit/coordinates/divided/ ./DIPS/raw/pdb

hope it useful

where is the docked pose?

Here me again, I got another question.

I am trying to repeat from your test_sets_pdb. But in the folder of test_sets_pdb/db5_equidock_results, I did not see docked complex structure. where is it?

in the folder of db5_test_random_transformed, there is one called complexes. If we are docking, why do we need the complex anyway.

Your help is greatly appreciated.

Thank you for your generous open source work, salute your work, and wish your soul peace

Although we have never met, your work has led me into new territory. I can't believe the history of journalism. Always have respect for your gratitude.

Hyperparameters

Hello !
It is a bit unclear to me which hyper parameters you used to train your model, could you provide a complete list of your best models for DIPS and DB5? In particular, I am not sure whether node and edge features were used. Moreover, the hyperparameters you mention in the paper are not the same as your best model's checkpoints.
Thanks :)

deallock in make_dataset

as we can see in the DISP_Plus, the author said: about make dips dataset #7
BioinfoMachineLearning/DIPS-Plus#7

"deadlock of sorts after a certain number of complexes have been processed."
when run for a long time, it will appeare By chance，
but for few files , it always run success, so i Split up to process the make_dataset

first mkdir six different fold like tmp1 tmp2 ....
mkdir tmp1
than cd in pdb and move the files to the new fold
mv ls | head -200 ../tmp6
run make_dataset.py seperate:
python3 make_dataset.py project/datasets/DIPS/raw/tmp1 project/datasets/DIPS/interim --num_cpus 24 --source_type rcsb --bound

DIPS dataset

Sorry to interrupt you, can you provide a processed DIPS dataset? Because I download DIPS very slowly

Source code for DIPS split

Hi 👋! In the paper it is mentioned: "For DIPS, the split is based on protein family to separate similar proteins". Is there a source code for this split? I could only find a random split in paritition_dips.py.

Question about the Fig.12

Hi @octavian-ganea ,
Thank you for your great work!
I have a question about the Figure 12. Would you please tell me how to draw Figure 12?
In fact, I try to draw it as follows.
First, the bound and unbound data are from the folder data/benchmark5.5/structures/
Then, I calculate the crmsd and irmsd of bound and unbound structures as the code in eval_pdb_outputset.py, where the unbound coordinate as the 'pred_coord' and the bound one as the 'ground_truth_coord'. However, I met the error that most of the number of residues in bound and corresponding unbound structures are not equal (to be exact, 179 ligand not matched , and 28 receptor not matched). So, how did you matched the corresponding bound and unbound structures? Or there is any mistake as I did?

Looking forward to your reply! Thanks!

Requesting for a requirements.txt for pip

Hi, I created a virtual environment and tried to pip install the dependencies listed in README.md. However, I'm not able to install some of them (e.g. cuda & dgl==0.7.0). Can I request for a requirements.txt to install the dependencies?

Thank you! :)

can not achieve the performance which mentioned in the original paper

Hi, @AxelGiottonini, when I use the command CUDA_VISIBLE_DEVICES=0 python -m src.train -hyper_search to run the code, I get the following results:
[2023-03-09 05:47:00.038149] [FINAL TEST for dips] --> epoch -1/10000 || mean/median complex rmsd 16.2906 / 15.7649 || mean/median ligand rmsd 35.9814 / 33.6197 || mean/median sqrt pocket OT loss 28.6057 || intersection loss 21.3417 || mean/median receptor rmsd 0.0000 / 0.0000

[2023-03-09 09:37:52.987988] [FINAL TEST for db5] --> epoch -1/10000 || mean/median complex rmsd 16.7756 / 16.5510 || mean/median ligand rmsd 40.3175 / 36.7189 || mean/median sqrt pocket OT loss 31.0876 || intersection loss 28.0045 || mean/median receptor rmsd 0.0000 / 0.0000

from the results, we can get the mean rmsd of dips test is 16.29, and the mean rmsd of db5 test is 16.77, but in paper, table 1 shows that the mean rmsd of dips test is 14.52, and the mean rmsd of db5 is 14.72, it has a lower performance than mentioned in the original paper

what is wrong?

and how to get the interface rmsd?

Code is not usable or documented

The code does not work. There is no documentation. Please provide a working example.

octavian-ganea / equidock_public Goto Github PK

equidock_public's Issues

Recommend Projects

Recommend Topics

Recommend Org