deepgraphlearning / gearnet Goto Github PK
View Code? Open in Web Editor NEWGearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125)
License: MIT License
GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125)
License: MIT License
Hello Dear Author!
For the dataset in the experiment, we have the following confusions:
By the way, I tried using torchDrug, but had a slightly different experience than PyG.
Hello, you discussed the results of pre-training on different datasets in the appendix.
As we can see in Table 8, the performance is comparable with real PDB or alphafold (V1 or V2), but real PDB has only 300,000 structures and alphafold has 800,000 structures.
Why the authors use more structures of alphafold in the main text? Finally, theoretically, the larger the dataset, the better the pre-training results, why Table 8 is not valid?
Hi, sorry for bothering you about Torch drug API. I want to reproduce your results on Fold Classification and Reaction. However, I find that the Fold dataset in torchdrug/datasets/fold/Fold doesn't contain protein structure data? It basically contains Sequences and Labels. According to my understanding of GearNet, it's a structure-based method and the pretrain tasks are also structure-based. So, I am confused about current situation.
Besides, I can't find the reaction dataset in torchdrug. Could you please tell me which dataset you used and the training config like EC and GO.
Sorry again for adding to your trouble. Thank you for open sourcing such a great work.
Hi, may I ask if you can provide your training details of basic GearNet (without IEConv) on the Fold prediction task?
The results of your paper are in page 8:
28.4 42.6 95.3
for test_fold, test_superfamily, test_family
.
Is it here https://github.com/DeepGraphLearning/GearNet/blob/main/config/downstream/Fold3D/gearnet.yaml
? I didn't find the basic GearNet model architecture in this repo.
Thanks!
Hi there, I noticed that the default config option save_interval: 5
(
pretrain.py
, will let the model to train on one pickled part (consisting 220k proteins) of AlphaDB for 5 epoch, and then another pickled part for 5 epochs, and so on so forth. (This option also controls the interval that the model is saved, although it could be also be adjusted independently. )
Could you provide a bit insight on why is it set this way? It it needed for some practical reason to train enough number of epochs on one pickle before moving to the next one? Thank you!
hi, authors, a great work, I want to use the GearNet as a feature extractor to extract the protein features, how to use it?
thanks!!!
Hello,
I have come across your fascinating GitHub repository on protein structure pre-training, and I am excited to explore its potential for my own research. I noticed that the provided data is in HDF5 format, and there is no preprocessing code available for PDB files. I would like to use my own PDB files for inference with your model, but I am unsure how to preprocess them to match the expected input format.
Would you be able to provide some guidance or share a sample preprocessing script for converting PDB files to the required HDF5 format? This would greatly help me and other researchers who are interested in utilizing your work for various applications.
Thank you for your time and for sharing your valuable work with the community. I am looking forward to your response and any assistance you can provide.
Thank you for your amazing work! I found that for the Fold Classification task, the GearNet-Edge model was implemented based on the GearNetIEConv script rather than the GearNet script, which has some detail differences (e.g., extra input embedding and ieconv layers). Based on this, I would like to ask whether you could provide the pretrained GearNet-Edge model based on multiview contrast learning and the GearNetIEConv script for Fold Classification (rather than based on GearNet script for EC task)? Thank you.
Hi, since GearNet models each residue in the protein by the C_alpha coordinates, how do you handle the situation when the coordinate for alpha carbon is missing?
I encountered a shape mismatch issue during runtime.
File "/home/admin/anaconda3/envs/test_env/lib/python3.7/site-packages/torchdrug-0.2.0-py3.7.egg/torchdrug/layers/conv.py", line 813, in message_and_aggregate
return update.view(graph.num_node, self.num_relation * self.input_dim)
RuntimeError: shape '[975, 472]' is invalid for input of size 312000
protein structure:
print(protein, protein.node_feature.shape) # PackedProtein(batch_size=1, num_atoms=[51], num_bonds=[975], num_residues=[51]) torch.Size([51, 21])
Hi, I am trying to reproduce the experiments, but the reproduced results have large gaps between the paper results.
Reproduced:
GearNet:
EC: 0.514 (200 epochs)
GO-BP: 0.176 (146 epochs)
GO-CC: 0.145 (84 epochs)
GearNet-Edge:
EC: 0.404 (163 epochs)
GO-BP: 0.255 (100 epochs)
GO-CC: 0.163 (107 epochs)
I use the same configuration and hyperparameter as provided in the rep. Training runs on one single GPU, and the some of the experiments are still under training.
Many thanks
Hi! First of all great job! I have been trying to do node classification in residue view, using my own node labels. However, I haven't been able to configure the NodePropertyPrediction task to use those labels instead of predicting the residue features. Do you have any guidance on how I can proceed to do this? Any help is appreciated
Hello! Thx for your great work!
For some reasons, I couldn't run the whole training loop in your pretraining scripts. But I got some checkpoints like "model_epoch_25.pth". The question is, how can I load this checkpoint and go on finishining my pretraining?
Looking forward to your reply!
Hi!
I am wondering the number of epochs in experiment. The epoch is set to 200 for EC stated in the paper, but in the config the epoch is set to 50. Whether should I modify the epochs to 200 for reproducing the experiment?
Thanks for your help!
15:52:05 Config file: ./config/downstream/GO-BP/gearnet_yy.yaml
15:52:05 {'dataset': {'branch': 'BP',
'class': 'GeneOntology',
'path': '/scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/protein-datasets/downstream/GO/',
'test_cutoff': 0.95,
'transform': {'class': 'ProteinView', 'view': 'residue'}},
'engine': {'batch_size': 2, 'gpus': [0], 'log_interval': 1000},
'metric': 'f1_max',
'optimizer': {'class': 'AdamW', 'lr': 0.0001, 'weight_decay': 0},
'output_dir': '/scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/protein_output/downstream/GO-BP',
'task': {'class': 'MultipleBinaryClassification',
'criterion': 'bce',
'graph_construction_model': {'class': 'GraphConstruction',
'edge_feature': 'gearnet',
'edge_layers': [{'class': 'SequentialEdge',
'max_distance': 2},
{'class': 'SpatialEdge',
'min_distance': 5,
'radius': 10.0},
{'class': 'KNNEdge',
'k': 10,
'min_distance': 5}],
'node_layers': [{'class': 'AlphaCarbonNode'}]},
'metric': ['auprc@micro', 'f1_max'],
'model': {'batch_norm': True,
'class': 'GearNet',
'concat_hidden': True,
'hidden_dims': [512, 512, 512, 512, 512, 512],
'input_dim': 21,
'num_relation': 7,
'readout': 'sum',
'short_cut': True},
'num_mlp_layer': 3},
'train': {'num_epoch': 200}}
15:52:05 Downloading https://zenodo.org/record/6622158/files/GeneOntology.zip to /scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/protein-datasets/downstream/GO/GeneOntology.zip
15:53:38 Extracting /scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/protein-datasets/downstream/GO/GeneOntology.zip to /scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/protein-datasets/downstream/GO
15:53:41 Extracting /scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/protein-datasets/downstream/GO/GeneOntology/train.zip to /scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/protein-datasets/downstream/GO/GeneOntology
15:56:21 Extracting /scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/protein-datasets/downstream/GO/GeneOntology/valid.zip to /scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/protein-datasets/downstream/GO/GeneOntology
15:56:37 Extracting /scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/protein-datasets/downstream/GO/GeneOntology/test.zip to /scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/protein-datasets/downstream/GO/GeneOntology
Constructing proteins from pdbs: 0%| | 0/36635 [00:00<?, ?it/s]/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/protein.py:213: UserWarning: Unknown residue `HOH`. Treat as glycine
warnings.warn("Unknown residue `%s`. Treat as glycine" % type)
/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/feature.py:42: UserWarning: Unknown value `HOH`
warnings.warn("Unknown value `%s`" % x)
[15:56:55] Explicit valence for atom # 6 O, 3, is greater than permitted
/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/protein.py:213: UserWarning: Unknown residue `BIS`. Treat as glycine
warnings.warn("Unknown residue `%s`. Treat as glycine" % type)
/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/feature.py:42: UserWarning: Unknown value `BIS`
warnings.warn("Unknown value `%s`" % x)
/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/protein.py:213: UserWarning: Unknown residue `EPE`. Treat as glycine
warnings.warn("Unknown residue `%s`. Treat as glycine" % type)
/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/feature.py:42: UserWarning: Unknown value `EPE`
warnings.warn("Unknown value `%s`" % x)
Constructing proteins from pdbs: 0%| | 3/36635 [00:00<54:08, 11.28it/s]/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/protein.py:213: UserWarning: Unknown residue `SO4`. Treat as glycine
warnings.warn("Unknown residue `%s`. Treat as glycine" % type)
/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/feature.py:42: UserWarning: Unknown value `SO4`
warnings.warn("Unknown value `%s`" % x)
/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/protein.py:213: UserWarning: Unknown residue `PO4`. Treat as glycine
warnings.warn("Unknown residue `%s`. Treat as glycine" % type)
/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/feature.py:42: UserWarning: Unknown value `PO4`
warnings.warn("Unknown value `%s`" % x)
/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/protein.py:213: UserWarning: Unknown residue `BME`. Treat as glycine
warnings.warn("Unknown residue `%s`. Treat as glycine" % type)
/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/feature.py:42: UserWarning: Unknown value `BME`
warnings.warn("Unknown value `%s`" % x)
Constructing proteins from pdbs: 0%| | 5/36635 [00:00<1:06:38, 9.16it/s]/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/feature.py:42: UserWarning: Unknown value `Fe`
warnings.warn("Unknown value `%s`" % x)
Constructing proteins from pdbs: 0%| | 5/36635 [00:00<1:10:20, 8.68it/s]
Traceback (most recent call last):
File "/scratch/user/yuning.you/project/protein_cross_modal_pretraining/ProteinRepresentation/GearNet/script/downstream.py", line 56, in <module>
dataset = core.Configurable.load_config_dict(cfg.dataset)
File "/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/core/core.py", line 269, in load_config_dict
return cls(**new_config)
File "/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/core/core.py", line 288, in wrapper
return init(self, *args, **kwargs)
File "/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/datasets/gene_ontology.py", line 72, in __init__
self.load_pdbs(pdb_files, verbose=verbose, **kwargs)
File "/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/dataset.py", line 750, in load_pdbs
protein = data.Protein.from_molecule(mol, **kwargs)
File "/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/utils/decorator.py", line 192, in wrapper
return obj(*args, **kwargs)
File "/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/protein.py", line 185, in from_molecule
protein = Molecule.from_molecule(mol, atom_feature=atom_feature, bond_feature=bond_feature,
File "/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/utils/decorator.py", line 192, in wrapper
return obj(*args, **kwargs)
File "/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/molecule.py", line 189, in from_molecule
feature += func(atom)
File "/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/feature.py", line 77, in atom_default
onehot(atom.GetChiralTag(), chiral_tag_vocab) + \
File "/scratch/user/yuning.you/.conda/envs/protein/lib/python3.9/site-packages/torchdrug/data/feature.py", line 47, in onehot
raise ValueError("Unknown value `%s`. Available vocabulary is `%s`" % (x, vocab))
ValueError: Unknown value `CHI_SQUAREPLANAR`. Available vocabulary is `range(0, 4)`
Dear developers,
Thanks for your great work. When I am trying to have a quick run through fine-tuning, via python script/downstream.py -c ./config/downstream/EC/gearnet.yaml --gpus [0]
, the above error messages are returned before model training (for both EC
and GO-BP
). I would appreciate your time to help me resolve it.
When I want to load the weight like this,
net = torch.load(pthfile)
I got this error:
RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
I don't know what happened. After Google, it seems that I need to update my CUDA driver. Are there any other options?
It seems that the file located in "https://ftp.ebi.ac.uk/pub/databases/alphafold/latest/UP000006548_3702_ARATH_v2.tar" really doesn't exist. When I entered this url in my browser, it also noticed me that the file doesn't exist.
14:43:55 Downloading https://ftp.ebi.ac.uk/pub/databases/alphafold/latest/UP000006548_3702_ARATH_v2.tar to /home/horace/scratch/protein-datasets/alphafold/UP000006548_3702_ARATH_v2.tar
Traceback (most recent call last):
File "script/pretrain.py", line 50, in <module>
dataset = core.Configurable.load_config_dict(cfg.dataset)
File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/core/core.py", line 269, in load_config_dict
return cls(**new_config)
File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/core/core.py", line 288, in wrapper
return init(self, *args, **kwargs)
File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/datasets/alphafolddb.py", line 122, in __init__
tar_file = utils.download(self.urls[species_id], path, md5=self.md5s[species_id])
File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/utils/file.py", line 31, in download
urlretrieve(url, save_file)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Howdy, thank you for sharing the amazing work, and may I ask if you have any plans on releasing the pre-trained weights for ESM-GearNet model? Many thanks!
Dear authors, thanks for your great works.
I have some questions about the data. I try to visualize the input protein sequence by call the to_sequence() during each batch. Here is the figure showing the sequence.
I wonder why there are some many .Gs, since . is a separator for multiple sequences (DeepGraphLearning/torchdrug#151). Also, after deleting all the ., the length of remaining sequence is the same as the number of the graph's nodes. Could you help to explain why there are some many .Gs? Many thanks!
Thank you for your great work. However when try to pretrain,I encountered such error.
python script/downstream.py -c config/downstream/EC/gearnet.yaml --gpus null
Traceback (most recent call last):
File "script/downstream.py", line 11, in
from torchdrug import core, models, tasks, datasets, utils
File "/home/hongyan/envs/pyg/lib/python3.7/site-packages/torchdrug/models/init.py", line 10, in
from .esm import EvolutionaryScaleModeling
File "/home/hongyan/envs/pyg/lib/python3.7/site-packages/torchdrug/models/esm.py", line 6, in
import esm
File "/home/hongyan/envs/pyg/lib/python3.7/site-packages/esm/init.py", line 8, in
from .data import Alphabet, RobertaAlphabet, BatchConverter, FastaBatchedDataset # noqa
File "/home/hongyan/envs/pyg/lib/python3.7/site-packages/esm/data.py", line 11, in
from torchvision.datasets.utils import download_url
File "/home/hongyan/envs/pyg/lib/python3.7/site-packages/torchvision/init.py", line 5, in
from torchvision import datasets
File "/home/hongyan/envs/pyg/lib/python3.7/site-packages/torchvision/datasets/init.py", line 1, in
from ._optical_flow import KittiFlow, Sintel, FlyingChairs, FlyingThings3D, HD1K
File "/home/hongyan/envs/pyg/lib/python3.7/site-packages/torchvision/datasets/_optical_flow.py", line 26, in
class FlowDataset(ABC, VisionDataset):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
Here is information of my environment:
torch 1.11.0
torchdrug 0.2.0
pyg 2.0.4
For proteins with multiple chains, did you split them by chain and input the splits into the model one by one, or directly input the whole proteins?
In the section "F ADDITIONAL EXPERIMENTAL RESULTS ON EC AND GO PREDICTION - Pretraining on different datasets" of your paper, you wrote:
Specifically, we extract 123,505
experimentally-determined protein structures from PDB whose resolutions are between 0.0 and 2.5
angstroms, and we further extract 305,265 chains from these proteins to construct the final dataset
which seems to implying that you trained the model on a bunch of single protein chains. However, meanwhile you did experiments of Enzyme Comission code prediction. To my knowledge, there are many enzymes containing more than one chain. It is impossible to split the enzyme into different chains and input into the model respectively (which hardly predicts the enzyme type correctly).
I was wondering how atom view is implemented? I'm getting a shape mismatch.
In mc-gearnet_edge.yaml I changed the view and entity level to 'atom' and input dimension to 38. As i find 38 atom types in the torchdrug protein class.
Is there another setting i need to change?
Thanks for creating GearNet!
Hi! Thank you for your great work!! I just have a quick question. I am trying to use Gearnet with the Fold3D dataset using the configuration file you provided. But I keep on getting this error (I add the screenshot in attach). If I remove mlp_batch_norm
and mlp_dropout
the code runs, but the model doesn't seem to train properly. I would really appreciate if you could give me your input on that, or let me know what I am doing incorrectly.
Thanks a lot!!
Hi,
Thanks for your wonderful work.
When running
# Run GearNet on the Enzyme Comission dataset with 1 gpu
python script/downstream.py -c config/downstream/EC/gearnet.yaml --gpus [0]
I met the following log:
/home/chenshoufa/workspace/torchdrug/torchdrug/data/protein.py:213: UserWarning: Unknown residue `PT`. Treat as glycine
warnings.warn("Unknown residue `%s`. Treat as glycine" % type)
/home/chenshoufa/workspace/torchdrug/torchdrug/data/feature.py:42: UserWarning: Unknown value ` PT`
warnings.warn("Unknown value `%s`" % x)
Constructing proteins from pdbs: 1%|█▏ | 172/19198 [00:29<1:01:27, 5.16it/s]
/home/chenshoufa/workspace/torchdrug/torchdrug/data/protein.py:213: UserWarning: Unknown residue `COB`. Treat as glycine
warnings.warn("Unknown residue `%s`. Treat as glycine" % type)
/home/chenshoufa/workspace/torchdrug/torchdrug/data/feature.py:42: UserWarning: Unknown value `COB`
warnings.warn("Unknown value `%s`" % x)
Constructing proteins from pdbs: 1%|█▏ | 183/19198 [00:30<57:46, 5.48it/s]
/home/chenshoufa/workspace/torchdrug/torchdrug/data/feature.py:42: UserWarning: Unknown value `Be`
warnings.warn("Unknown value `%s`" % x)
/home/chenshoufa/workspace/torchdrug/torchdrug/data/protein.py:213: UserWarning: Unknown residue `ADP`. Treat as glycine
warnings.warn("Unknown residue `%s`. Treat as glycine" % type)
/home/chenshoufa/workspace/torchdrug/torchdrug/data/feature.py:42: UserWarning: Unknown value `ADP`
warnings.warn("Unknown value `%s`" % x)
/home/chenshoufa/workspace/torchdrug/torchdrug/data/protein.py:213: UserWarning: Unknown residue `BEF`. Treat as glycine
warnings.warn("Unknown residue `%s`. Treat as glycine" % type)
/home/chenshoufa/workspace/torchdrug/torchdrug/data/feature.py:42: UserWarning: Unknown value `BEF`
warnings.warn("Unknown value `%s`" % x)
Constructing proteins from pdbs: 1%|█▏ | 186/19198 [00:31<1:01:54, 5.12it/s]
/home/chenshoufa/workspace/torchdrug/torchdrug/data/protein.py:213: UserWarning: Unknown residue `1NB`. Treat as glycine
warnings.warn("Unknown residue `%s`. Treat as glycine" % type)
/home/chenshoufa/workspace/torchdrug/torchdrug/data/feature.py:42: UserWarning: Unknown value `1NB`
Is it normal?
Thanks in advance.
Hello,
Is there any code available for the explainability experiment in Section K in the appendix of your paper https://arxiv.org/pdf/2203.06125.pdf?
Thank you.
Hi, thank you for your amazing work!
I am trying to evaluate GearNet on secondary structure dataset, but it gives me this error:
AttributeError: 'PackedProtein' object has no attribute 'node_position'
I think it is because secondary structure datset doesn't provide node_position, which is needed for gearnet.
Is there any other way I can evaluate secondary structure on gearnet?
Thank you.
When I run
python -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/downstream/GO-BP/gearnet_edge.yaml --gpus [0,1,2,3] --ckpt
on worker*1 Tesla-V100-SXM2-32GB:4 GPU, 47 CPU, I got the error:
[219013] [E ProcessGroupNCCL.cpp:587] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BASE, Timeout(ms)=1800000) ran for 1804901 milliseconds before timing out.
[219014] [E ProcessGroupNCCL.cpp:587] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1805974 milliseconds before timing out.
[219015] [E ProcessGroupNCCL.cpp:587] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1805985 milliseconds before timing out.
[219016] Traceback (most recent call last):
[219017] File "/hubozhen/GearNet/script/downstream.py", line 75, in
[219018] train_and_validate(cfg, solver, scheduler)
[219019] File "/hubozhen/GearNet/script/downstream.py", line 30, in train_and_validate
[219020] solver.train(**kwargs)
[219021] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torchdrug/core/engine.py", line 155, in train
[219022] loss, metric = model(batch)
[219023] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
[219024] return forward_call(*input, **kwargs)
[219025] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
[219026] output = self.module(*inputs[0], **kwargs[0])
[219027] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
[219028] return forward_call(*input, **kwargs)
[219029] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torchdrug/tasks/property_prediction.py", line 279, in forward
[219030] pred = self.predict(batch, all_loss, metric)
[219031] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torchdrug/tasks/property_prediction.py", line 300, in predict
[219032] output = self.model(graph, graph.node_feature.float(), all_loss=all_loss, metric=metric)
[219033] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
[219034] return forward_call(*input, **kwargs)
[219035] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torchdrug/models/gearnet.py", line 99, in forward
[219036] edge_hidden = self.edge_layers[i](line_graph, edge_input)
[219037] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
[219038] return forward_call(*input, **kwargs)
[219039] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torchdrug/layers/conv.py", line 92, in forward
[219040] output = self.combine(input, update)
[219041] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torchdrug/layers/conv.py", line 438, in combine
[219042] output = self.batch_norm(output)
[219043] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
[219044] return forward_call(*input, **kwargs)
[219045] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 758, in forward
[219046] world_size,
[219047] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/nn/modules/_functions.py", line 42, in forward
[219048] dist._all_gather_base(combined_flat, combined, process_group, async_op=False)
[219049] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 2070, in _all_gather_base
[219050] work = group._allgather_base(output_tensor, input_tensor)
[219051] RuntimeError: NCCL communicator was aborted on rank 0. Original reason for failure was: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BASE, Timeout(ms)=1800000) ran for 1804901 milliseconds before timing out.
[219052] /opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torchdrug/layers/functional/functional.py:474: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
[219053] index1 = local_index // local_inner_size + offset1
[219054] /opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torchdrug/layers/functional/functional.py:474: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
[219055] index1 = local_index // local_inner_size + offset1
[219056] [E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[219057] terminate called after throwing an instance of 'std::runtime_error'
[219058] what(): [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(OpType=_ALLGATHER_BASE, Timeout(ms)=1800000) ran for 1804901 milliseconds before timing out.
[219059] [E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[219060] terminate called after throwing an instance of 'std::runtime_error'
[219061] what(): [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1805974 milliseconds before timing out.
[219062] /opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torchdrug/data/graph.py:1667: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
[219063] edge_in_index = local_index // local_inner_size + edge_in_offset
[219064] [E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
[219065] terminate called after throwing an instance of 'std::runtime_error'
[219066] what(): [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=BROADCAST, Timeout(ms)=1800000) ran for 1805985 milliseconds before timing out.
[219067] WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 21 closing signal SIGTERM
[219068] ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 20) of binary: /opt/anaconda3/envs/manifold/bin/python
[219069] Traceback (most recent call last):
[219070] File "/opt/anaconda3/envs/manifold/lib/python3.7/runpy.py", line 193, in _run_module_as_main
[219071] "main", mod_spec)
[219072] File "/opt/anaconda3/envs/manifold/lib/python3.7/runpy.py", line 85, in _run_code
[219073] exec(code, run_globals)
[219074] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in
[219075] main()
[219076] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
[219077] launch(args)
[219078] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
[219079] run(args)
[219080] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
[219081] )(*cmd_args)
[219082] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
[219083] return launch_agent(self._config, self._entrypoint, list(args))
[219084] File "/opt/anaconda3/envs/manifold/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
[219085] failures=result.failures,
[219086] torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
[219087] ===================================================
[219088] /hubozhen/GearNet/script/downstream.py FAILED
[219089] ---------------------------------------------------
[219090] Failures:
[219091] [1]:
[219092] time : 2022-12-12_09:41:02
[219093] host : pytorch-7c3c96f1-d9hcm
[219094] rank : 2 (local_rank: 2)
[219095] exitcode : -6 (pid: 22)
[219096] error_file: <N/A>
[219097] traceback : Signal 6 (SIGABRT) received by PID 22
[219098] [2]:
[219099] time : 2022-12-12_09:41:02
[219100] host : pytorch-7c3c96f1-d9hcm
[219101] rank : 3 (local_rank: 3)
[219102] exitcode : -6 (pid: 23)
[219103] error_file: <N/A>
[219104] traceback : Signal 6 (SIGABRT) received by PID 23
[219105] ---------------------------------------------------
[219106] Root Cause (first observed failure):
[219107] [0]:
[219108] time : 2022-12-12_09:41:02
[219109] host : pytorch-7c3c96f1-d9hcm
[219110] rank : 0 (local_rank: 0)
[219111] exitcode : -6 (pid: 20)
[219112] error_file: <N/A>
[219113] traceback : Signal 6 (SIGABRT) received by PID 20
[219114] ===================================================
Someone said this happened when loading big data, I find the use ratios of these for GPUs are 100%.
However, I changed the same procedure on another V100 mechaine (worker*1:
Tesla-V100-SXM-32GB:4 GPU, 48 CPU,), it is OK.
It confused me.
Hello,
Running
python -m torch.distributed.launch --nproc_per_node=4 script/downstream.py -c config/downstream/EC/gearnet.yaml --gpus [0,1,2,3]
does not succeed with following log:
20:21:09 Extracting /home/chenshoufa/scratch/protein-datasets/EnzymeCommission.zip to /home/chenshoufa/scratch/protein-datasets
20:21:09 Extracting /home/chenshoufa/scratch/protein-datasets/EnzymeCommission.zip to /home/chenshoufa/scratch/protein-datasets
20:21:09 Extracting /home/chenshoufa/scratch/protein-datasets/EnzymeCommission.zip to /home/chenshoufa/scratch/protein-datasets
20:21:09 Extracting /home/chenshoufa/scratch/protein-datasets/EnzymeCommission.zip to /home/chenshoufa/scratch/protein-datasets
Loading /home/chenshoufa/scratch/protein-datasets/EnzymeCommission/enzyme_commission.pkl.gz: 64%|██████████████████████████████████████████▉ | 11854/18515 [08:49<20:55, 5.30it/s]Killing subprocess 1350247
Killing subprocess 1350248
Killing subprocess 1350249
Killing subprocess 1350250
Traceback (most recent call last):
File "/home/chenshoufa/anaconda3/envs/gear/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/chenshoufa/anaconda3/envs/gear/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/chenshoufa/anaconda3/envs/gear/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in <module>
main()
File "/home/chenshoufa/anaconda3/envs/gear/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/chenshoufa/anaconda3/envs/gear/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/chenshoufa/anaconda3/envs/gear/bin/python', '-u', 'script/downstream.py', '--local_rank=3', '-c', 'config/downstream/EC/gearnet.yaml', '--gpus', '[0,1,2,3]']' died with <Signals.SIGKILL: 9>.
Could you help me with this issue?
Thanks for the wonderful work!
I am trying the use the learned embeddings for a downstream protein classification problem on my own datasets. Since training the model requires a good HPC, I am wondering:
Howdy, thank u for ur awesome work in Enhancing Protein Language Models with Structure-based Encoder and Pre-training. I am running the pretaining experiment now, and I am facing an issue of "Can't find atom_feature
in features.atom
". I paste the error statement below.
It seems like it cannot recognize the atom_feature: null or bond_feature: null. Do I need to change the source code for implementing these two arguments?
Any help will be grateful!
Hi, thank you for your amazing work!
I tried to reproduce the GearNet results on Fold3D dataset, I followed the original .yaml file in which the StepLR scheduler was specified. However, there was an error occurring when using the scheduler as follows, I would like to ask what causes this, thank you!
15:28:38 #train: 12312, #valid: 736, #test: 718
Traceback (most recent call last):
File "script/downstream.py", line 74, in
solver, scheduler = util.build_downstream_solver(cfg, dataset)
File "/GearNet-new/util.py", line 121, in build_downstream_solver
scheduler = core.Configurable.load_config_dict(cfg.scheduler)
File "/torchdrug/lib/python3.8/site-packages/torchdrug/core/core.py", line 269, in load_config_dict
return cls(**new_config)
File "/torchdrug/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/torchdrug/lib/python3.8/site-packages/torchdrug/core/core.py", line 288, in wrapper
return init(self, *args, **kwargs)
File "/torchdrug/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 367, in init
super(StepLR, self).init(optimizer, last_epoch, verbose)
File "/torchdrug/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 367, in init
super(StepLR, self).init(optimizer, last_epoch, verbose)
File "/torchdrug/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 367, in init
super(StepLR, self).init(optimizer, last_epoch, verbose)
[Previous line repeated 991 more times]
RecursionError: maximum recursion depth exceeded while calling a Python object
Hi, when I execute the command:
python script/pretrain.py -c config/pretrain/mc_gearnet_edge.yaml --gpus [0]
There is an error occurring. How to solve This problem?
RuntimeError:
General Union types are not currently supported. Only Union[T, NoneType] (i.e. Optional[T]) is supported.:
File "/home/lvqy/anaconda3/envs/ZernikeMetric/lib/python3.8/site-packages/torch_cluster/rw.py", line 18
num_nodes: Optional[int] = None,
return_edge_indices: bool = False,
) -> Union[Tensor, Tuple[Tensor, Tensor]]:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
"""Samples random walks of length :obj:walk_length
from all node indices
in :obj:start
in the graph given by :obj:(row, col)
as described in the
Hello, I'd like to know whether I can get the configuration file for training the fold dataset?
Hi, thank you for sharing the work and answering questions. I recently want to reproduce the proteinBERT results as shown in the your paper. However, the performance with directly using the given config file is only about 0.079.
The loss is pretty low on training, validation, and testing, but it seems the model isn't able to classify data correctly. The downstream task is EC.
Do you have any suggestions fixing this issue?
And I also try to use HuggingFace protBERT to rerun the experiments. The result is also around 0.078 and has low loss values. Would you be willing to give any advice on this as well?
Many thanks for your answering!
Hi,
Thank you for your great work. Do you plan on releasing the model's weights any time soon? The README doesn't seem to mention any pretrained model. This would be very helpful to quickly get representations for new sequences.
Hi, I've learned a lot from this great work. Thank you for presenting it in the paper and here!
I wanted to ask about implementation of series connection of PLM & GNN in the FusionNetwork. In the PLM+GNN paper (
Zhang, Z. et al. Enhancing Protein Language Models with Structure-based Encoder and Pre-training. Arxiv (2023) doi:10.48550/arxiv.2303.06275), the authors tested three ways of fusing PLM & GNN and decided to use the series connection. The series connection is described as
Series: we replace the node features of GearNet with the output of ESM-1b and use the output of GearNet as final representations.
In the implementation of FusionNetwork. I saw it indeed uses the output of ESM-1b as the node features of GearNet, but then seems to use the output of GearNet concatenated with the output of ESM-1b as final representations (pasted below). So which is the way that the authors found most effective? Shall one use sole output from GearNet or the concatenated output?
def forward(self, graph, input, all_loss=None, metric=None):
output1 = self.sequence_model(graph, input, all_loss, metric)
node_output1 = output1.get("node_feature", output1.get("residue_feature"))
output2 = self.structure_model(graph, node_output1, all_loss, metric)
node_output2 = output2.get("node_feature", output2.get("residue_feature"))
node_feature = torch.cat([node_output1, node_output2], dim=-1)
graph_feature = torch.cat([
output1['graph_feature'],
output2['graph_feature']
], dim=-1)
return {
"graph_feature": graph_feature,
"node_feature": node_feature
}
If possible, could you please share some configurations on trying out the "cross" style (quote below) of fusing PLM & GNN? I am interested in testing this option and wanted to learn about the configurations of the transformer (number of layers, hidden dims, number of head) that you have tried.
Cross: we concatenate the output of ESM-1b and GearNet and then feed them into a transformer to perform cross-attention between modalities. The output of the transformer will be used asfinal representations.
Hey, I am sorry to trouble you about the pre-training details about GeatNet. :)
After pre-training on the AlphaFold, will you fix the model's parameters and only change the prediction head's parameter? Or update the pre-trained model and its corresponding prediction head together?
Hi!
I was wondering if there is any reason that the GearNetIEConv encoder would return variable embeddings for the same input file. I encountered this using my own data, but when I set a torch manual_seed, the embeddings became constant for the same input. And is this expected to have any effect on model performance?
Thanks for your help!
Hi, I download about 38k pdb files by using the config files, and paper indicates the pretrain dataset is 805k. Should this be expected? Many thanks!
Hello,
I would like to ask how to modify the URL ' https://ftp.ebi.ac.uk/pub/databases/alphafold/latest/UP000006548_3702_ARATH_v2.tar' when using the command 'python script/pretrain.py -c config/pretrain/mc_gearnet_edge.yaml --gpus [0]' to run this code.
hi, authors, great works, I notice that the GearNet does 4 downstream tasks, they are 1): EC number prediction, 2): GO term prediction, 3): Fold classification, 4): Reaction classification, I am interested in the GO term prediction task, could the authors release the corresponding dataset about this task? Thanks!!!
Thank you so much for your outstanding work!
I'm interested in your models and would like to run them on some custom datasets. Unfortunately, I haven't found any instructions on how to preprocess the raw data. Could you please tell me whether it is possible to run your models on custom datasets? And if so, where can I find your preprocessing script?
Thank you!
Hello! GearNet is really a good work! But I have a problem. I see that the hidden dimension set in the config file is [512,512,512,512,512,512]. Since I don't know much about the specific principle of graph neural network, I want to know what information these dimensions respectively represent.Thank you!
Hello, because my code understanding ability is not very strong, I have a little problem in understanding the model:
(Because I am very interested in your work, I am sorry to have a lot of questions~)
Refer to the mc_gearnet_edge.yaml file, the Multiview Contrast in the model is followed by a multi-layer perceptron. However, the output in Multiview Contrast is divided into output1 and output2 consisting of graph features and node features, but there is only one input in MLP.
Looking forward to your reply very much!
Hi, thanks for your work!
In the Fold3D dataset class, why is the edge_list field set to an empty edge list of [[0,0,0]] when the input hdf5 files are loaded (on line 85 in dataset.py)? I'm trying to load in my own protein graphs into the GearNetIEConv model to get an output embedding, and if I don't set edge_list=[[0,0,0]], I run into an IndexOutOfBounds error later on when the protein graph is getting passed through the "message" function of the GeometricRelationalGraphConv layer.
Hi, may I ask if you have a more detailed description of the data structure of the protein .hdf5
file of the Fold3D dataset? I find it contains much information about the protein, but I am not sure what some of them mean.
"Specifically, we extract 123,505 experimentally-determined protein structures from PDB"
Hi, appreciate your great work!
I am quite new in this area. Could you please tell me where is the code to download and preprocess those experimentally-determined proteins?
Hello! Amazing work here. I am curious about a detail of setup of different pretrain tasks specified in the config files (.yaml files). In config of self-prediction tasks, there seems to be a TruncateProtein applied to the AlphaFoldDB dataset with max_length=100, while in config of Multiview Contrast task there isn't. Is similar truncation specified implicitly somewhere else in cases for MC task? Is the truncating using max_length=100 needed to reproduce the results for pretraining on self-prediction tasks?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.