Code Monkey home page Code Monkey logo

Comments (13)

octavian-ganea avatar octavian-ganea commented on May 24, 2024 1

"I assume you search over SOME possible poses which exclude the volume of the receptor" --> please read our paper carefully before making statements which are not true. Thanks.

Again, to summarize: we have released post-processed DIPS and DB5 outputs and the corresponding script to remove steric clashes from any trained model (some small left clashes can be completely removed by playing with the parameters of that script, such as number of GD iterations and learning rate). We also are working on solving steric clashes as hard model constraints and will keep you updated on that. Thanks for your useful insights.

from equidock_public.

octavian-ganea avatar octavian-ganea commented on May 24, 2024

You do not have to superimpose 1AVX_l_b_EQUIDOCK.pdb and 1AVX_r_b_complex.pdb, they should already be in the docked position. Maybe that is the source of the many clashes ? Also, our model predictions indeed experience some clashes as our non-intersection constraint is a soft constraint at the moment, especially as the test distribution diverges a lot from the train distribution (which is likely for DIPS given the protein family dataset splits). We are working on this aspect.

from equidock_public.

LivC193 avatar LivC193 commented on May 24, 2024

Hi,
Thank you for the answer. The output files in our experiments, and in your examples https://github.com/octavian-ganea/equidock_public/blob/d65c674b1e5fb4c062a3c8ab2fd276dfc1c1fe90/test_sets_pdb/dips_equidock_results/a9_1a95.pdb1_3.dill_l_b_EQUIDOCK.pdb have just one chain (the ligand) not the receptor. So, how can I see the docked complex, because the results files have just 1 chain instead of 2.

from equidock_public.

octavian-ganea avatar octavian-ganea commented on May 24, 2024

The receptor is fixed to the ground truth receptor, only the ligand is moved. The ground truth receptor is : test_sets_pdb/dips_test_random_transformed/complexes/a9_1a95.pdb1_3.dill_r_b_COMPLEX.pdb . Note that this does not apply to ATTRACT and HDOCK which have both receptor and ligand moved, for instance db5_attract_results/ has both ligand and receptor pdb files.

from equidock_public.

LivC193 avatar LivC193 commented on May 24, 2024

Apologies if I wasn't clear earlier. I will try to reformulate what I am trying to achieve.
I want to dock a protein-ligand (one chain) to a protein-receptor (another chain). The reason I need to do this is because I want to calculate different structural features of the co-structure resulted after docking. This allows me to identify critical residues, interface pockets, hydrogen bonding etc. However, your algoritm provides as a result just the ligand pose, which makes it impossible to calculate any of the features I am interested in. Thus, I need to assemble the co-structure. I will give an example below.

If we have 2 chains (A and B) with A being the receptor and B being the ligand, and their ground truth positions, your algorithm will fix A in place and will move B around until a final position is obtained. Now since A has not been moved from its original position I can open both A and docked-B in PyMOL for example and I should get the docked complex in order to asses the interface. However, when I do this for any example that I run or that you put in your examples I get massive clashes. You can find some of your examples below and how they look in 3D.

First example 7CEI: undocked ligand in RED, docked ligand in GREEN, fixed receptor in MAGENTA, ground truth receptor in PINK. PINK and MAGENTA overlap perfectly, as expected since the receptor does not move, however you can clearly see massive clashes between GREEN and MAGENTA/PINK chains.
Second example 2AYO: same colour scheme even worse clashes.

This behaviour is constant in all examples and in all my runs.
7CEI_docked
2AYO_docked

from equidock_public.

octavian-ganea avatar octavian-ganea commented on May 24, 2024

As said above, our current non-intersection loss is a soft constraint, so some clashes do happen, especially as the test distribution diverges a lot from the train distribution (which is likely for DIPS given the protein family dataset splits). We are working on a fixing this issue directly in the model. In the meantime, I just pushed a code to fix this as a post-processing step. You can run it by setting this variable to True: https://github.com/octavian-ganea/equidock_public/blob/main/src/inference_rigid.py#L19. You should now not expect steric clashes, but some small "holes" are possible. Note that further improved predictions might be obtained by fine-tuning EquiDock outputs with a local finetuning method such as https://new.rosettacommons.org/demos/latest/tutorials/Protein-Protein-Docking/Protein-Protein-Docking#local-docking . We have already tried FireDock for fine-tuning, but the results were worse. I hope it helps.

from equidock_public.

LivC193 avatar LivC193 commented on May 24, 2024

Amazing will run the fix and come back.

To your point that "some clashes do happen", I want to make it clear that clashes happen in absolutely all examples present in your repo and it does not matter what training dataset you have been using. Again I am talking about examples you provide, not my runs.

Next, Rosetta local docking requires the proteins to be in proximity (8-10 A apart), your outputs are 8-10 A one into each other. Rosetta local docking will fail to even start if I am giving it your outputs. Moreover, would be impossible to give Rosetta your output because your output has just 1 chain, and docking requires at least 2. Which begs the question: Why are you outputting just one chain and not the whole complex like all other docking algorithms?

from equidock_public.

octavian-ganea avatar octavian-ganea commented on May 24, 2024

Re 2 files: You can simply concatenate the two PDB files into a single PDB file :).

Re Rosetta: please try using the new script.

from equidock_public.

LivC193 avatar LivC193 commented on May 24, 2024

I don't want to be inconsiderate or rude but please understand that all your examples on your code don't work from both a computational and biological point of view.

I tried to run your fix and it doesn't work. Could you please run it and show for 7CEI and 2AYO the outputs concatenated (meaning the actual complex of docking ligand and ground truth receptor) for each example?

from equidock_public.

octavian-ganea avatar octavian-ganea commented on May 24, 2024

What do you mean by the "examples do not work from a computational point of view" ? In what sense ?

Also, can you detail what you mean by "doesn't work from a biological point of view"? I just uploaded the predictions without steric clashes (using the script I mentioned above) in the directories test_sets_pdb/dips_equidock_no_clashes_results/ amd test_sets_pdb/db5_equidock_no_clashes_results/. As said above, for the most biologically plausible outputs, it is currently best to run a local fine-tuning method such as Rosetta local docking. We have done this also for https://arxiv.org/abs/2202.05146 with very good outcomes. We are currently investigating extensions of EquiDock that would hard-code no-clash constraints in our deep learning model as opposed to soft constraints, and will certainly keep the community updated once we have a new version of EquiDock.

Thanks for your input,

from equidock_public.

LivC193 avatar LivC193 commented on May 24, 2024

"examples do not work from a computational point of view": clashes, especially those presented above should be heavily penalised. If those are the final poses it means no other pose was better, which means no other pose had less clashes, which means during the search (for the best pose) no pose with zero clashes was sampled.

"doesn't work from a biological point of view": the idea behind global docking is to give the user a sense of the possible interfaces between 2 chains, that will be further refined during local docking and arrive a putative docking pose that will resemble what actually happens in vivo/ in vitro . However, the above presented poses do not hold any biological relevance, even theoretically, and no post-refinement can be done on them.

https://arxiv.org/abs/2202.0514 -> this is protein-small ligand docking which is different from protei-protein docking on every level. I could not find the paragraf in which you mention the Rosetta local docking method in this paper, in in rosetta docking the scoring functions for small ligands are different from protein docking.

from equidock_public.

octavian-ganea avatar octavian-ganea commented on May 24, 2024

Re your comments on "examples do not work from a computational point of view" -- I am sorry, but I think you are confusing the terms. Bottlenecks that would prevent an algorithm to work computationally are related to run time, memory footprint, etc. Our algorithm does not suffer from computational issues (when talking about the term from a computer science perspective). Also, you assume that our method works by searching over possible poses, which is what previous methods do, but not ours. Our method directly generates keypoints that represent the binding interface, so your statement is not true.

"However, the above presented poses do not hold any biological relevance, even theoretically, and no post-refinement can be done on them." --> I already told above that clashes have been resolved in test_sets_pdb/dips_equidock_no_clashes_results/ and test_sets_pdb/db5_equidock_no_clashes_results/. Why can't one do a post-refinement on them exactly? Also, what are the arguments on which you base the statement "do not hold any biological relevance" ?

Re our drug binding paper: it was just an example that a refinement with a local method was very successful for us. For that paper we used Smina.

Best,

from equidock_public.

LivC193 avatar LivC193 commented on May 24, 2024

"examples do not work from a computational point of view" : To your first argument, if that would be the case you could just apply a random translation/rotation matrix to the input. That would be very computationally cheap. For the second part, I do NOT assume you search over ALL possible poses, I assume you search over SOME possible poses which exclude the volume of the receptor, which based on original/published results is not the case. This is the computational aspect I am talking about, apologies if my methodology was not exact enough.

"biological relevance, even theoretic": Cannot do post-refinement on the original structures, because the refinement needs to be fully atomistic and there are too many clashes to be resolved. The clashes have not been resolved, they have been diminished. There are still clashes as it can be observed from the attached pictures, and this is not the point. The "fix", at inference time, has been added today (2022) while the paper with all the relevant metrics was done in 2021. So how should I trust those results now, especially the ones regarding speed and invariance?

The argument on which I base my statement is that no 2 chains of proteins can mesh into one another, not here anyway. If you want me to be more exact no protein can ever have any clashes in its structure.

How is unfair that you do not know my name? I don't care about your name, I care about the work presented in this repo. What I do care about is trying to run it, generating poses with many unresolvable clashes and not understanding where is the issue, again agnostic of names.
2AYO_still_clash
7CEI_still clash

from equidock_public.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.