Hi 👋! In the paper it is mentioned: "For DIPS, the split is based on protein family t

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Source code for DIPS split about equidock_public HOT 6 OPEN

octavian-ganea commented on May 24, 2024

Source code for DIPS split

from equidock_public.

Comments (6)

AxelGiottonini commented on May 24, 2024 1

What I did in a previous project was to cluster the proteins using foldseek (all vs all) and to create a graph using all the protein as vertices and putting edges between paired proteins (receptor - ligand) and proteins in a cluster. Then I used the biggest clusters to create the training set and the smallest for validation and testing (90-5-5).

What may be an option could also be to characterize the binding pocket and split the data according to this characterization, but I miss knowledge to do that kind of things.

from equidock_public.

anton-bushuiev commented on May 24, 2024 1

Thank you for sharing!

Yes, I am also considering to create a split based on interface similarity using a tool like this.

from equidock_public.

AxelGiottonini commented on May 24, 2024

Hey !

I don't remember finding any code for the split, but you can certainly use create a simple script to cluster your proteins using foldseek or something similar and dgl, networkx or any other graph library you want. The only thing you need to output is then the list of files in the same format than you could find in the original splits definition.

Sincerly meow !

from equidock_public.

anton-bushuiev commented on May 24, 2024

Hi, @AxelGiottonini!

Thank you very much for you response. Foldseek looks perfect, I did not know about it. What exactly do you mean by using a graph library? To cluster PPIs using graph metrics based on their EquiDock graph representations? Also, I am still curios how exactly PPIs were split based on the folds of individual interacting partners. If PPI1 has partners with folds A and B and PPI2 with C and D, are they decided to be separated if {A, B} != {C, D} or more strictly {A, B} and {C, D} are disjoint 🤔? It may be important from the perspective of data leakage.

from equidock_public.

AxelGiottonini commented on May 24, 2024

You're welcome ! I did not look for such tool but that seems promising !

Also, when I was working with EquiDock, I had results with a bad accuracy considering only the ligand RMSD (as the receptor RMSD is always 0). I'll share my code and results in the next days, but could you consider sharing your results if something similar occurred?

from equidock_public.

anton-bushuiev commented on May 24, 2024

Hi! I do not use EquiDock and I was mainly interested in the data split. I am working on a related problem of predicting binding affinity change upon mutation (based on the SKEMPI2 data). It as about learning from already bound structures, so its a bit different.

from equidock_public.

Source code for DIPS split about equidock_public HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent