molml / moleculeace Goto Github PK
View Code? Open in Web Editor NEWA tool for evaluating the predictive performance on activity cliff compounds of machine learning models
License: MIT License
A tool for evaluating the predictive performance on activity cliff compounds of machine learning models
License: MIT License
I've been unable to run the example. It doesn't seem possible to directly reproduce the environment you used, and I'm getting an exception when I try to run your code using an environment I created with.
conda create -n moleculeACE python=3.8
conda activate moleculeACE
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install tensorflow
conda install pyg -c pyg
pip install transformers
When I try to run the README example. I get an exception on
model.train(data.x_train, data.y_train)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[2], line 1
----> 1 model.train(data.x_train, data.y_train)
File ~/software/MoleculeACE/MoleculeACE/models/utils.py:82, in GNN.train(self, x_train, y_train, x_val, y_val, early_stopping_patience, epochs, print_every_n)
78 break
80 # As long as the model is still improving, continue training
81 else:
---> 82 loss = self._one_epoch(train_loader)
83 self.train_losses.append(loss)
85 val_loss = 0
File ~/software/MoleculeACE/MoleculeACE/models/utils.py:119, in GNN._one_epoch(self, train_loader)
116 self.optimizer.zero_grad()
118 # Forward pass
--> 119 y_hat = self.model(batch.x.float(), batch.edge_index, batch.edge_attr.float(), batch.batch)
121 # Calculating the loss and gradients
122 loss = self.loss_fn(squeeze_if_needed(y_hat), squeeze_if_needed(batch.y))
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
1190 # If we don't have any hooks, we want to skip the rest of the logic in
1191 # this function, and just call forward.
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
File ~/software/MoleculeACE/MoleculeACE/models/mpnn.py:104, in MPNNmodel.forward(self, x, edge_index, edge_attr, batch)
101 node_feats = node_feats.squeeze(0)
103 # perform global pooling using a multiset transformer to get graph-wise hidden embeddings
--> 104 out = self.transformer(node_feats, batch, edge_index)
106 # Apply a fully connected layer.
107 for k in range(len(self.fc)):
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch_geometric/nn/aggr/base.py:131, in Aggregation.__call__(self, x, index, ptr, dim_size, dim, **kwargs)
126 if index.numel() > 0 and dim_size <= int(index.max()):
127 raise ValueError(f"Encountered invalid 'dim_size' (got "
128 f"'{dim_size}' but expected "
129 f">= '{int(index.max()) + 1}')")
--> 131 return super().__call__(x, index, ptr, dim_size, dim, **kwargs)
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
1190 # If we don't have any hooks, we want to skip the rest of the logic in
1191 # this function, and just call forward.
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch_geometric/nn/aggr/gmt.py:245, in GraphMultisetTransformer.forward(self, x, index, ptr, dim_size, dim, edge_index)
243 for i, (name, pool) in enumerate(zip(self.pool_sequences, self.pools)):
244 graph = (x, edge_index, index) if name == 'GMPool_G' else None
--> 245 batch_x = pool(batch_x, graph, mask)
246 mask = None
248 return self.lin2(batch_x.squeeze(1))
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
1190 # If we don't have any hooks, we want to skip the rest of the logic in
1191 # this function, and just call forward.
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch_geometric/nn/aggr/gmt.py:133, in PMA.forward(self, x, graph, mask)
127 def forward(
128 self,
129 x: Tensor,
130 graph: Optional[Tuple[Tensor, Tensor, Tensor]] = None,
131 mask: Optional[Tensor] = None,
132 ) -> Tensor:
--> 133 return self.mab(self.S.repeat(x.size(0), 1, 1), x, graph, mask)
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
1190 # If we don't have any hooks, we want to skip the rest of the logic in
1191 # this function, and just call forward.
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch_geometric/nn/aggr/gmt.py:59, in MAB.forward(self, Q, K, graph, mask)
57 if graph is not None:
58 x, edge_index, batch = graph
---> 59 K, V = self.layer_k(x, edge_index), self.layer_v(x, edge_index)
60 K, _ = to_dense_batch(K, batch)
61 V, _ = to_dense_batch(V, batch)
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
1190 # If we don't have any hooks, we want to skip the rest of the logic in
1191 # this function, and just call forward.
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch_geometric/nn/conv/gcn_conv.py:198, in GCNConv.forward(self, x, edge_index, edge_weight)
195 x = self.lin(x)
197 # propagate_type: (x: Tensor, edge_weight: OptTensor)
--> 198 out = self.propagate(edge_index, x=x, edge_weight=edge_weight,
199 size=None)
201 if self.bias is not None:
202 out = out + self.bias
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py:392, in MessagePassing.propagate(self, edge_index, size, **kwargs)
389 if res is not None:
390 edge_index, size, kwargs = res
--> 392 size = self.__check_input__(edge_index, size)
394 # Run "fused" message and aggregation (if applicable).
395 if is_sparse(edge_index) and self.fuse and not self.explain:
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py:216, in MessagePassing.__check_input__(self, edge_index, size)
213 the_size[1] = size[1]
214 return the_size
--> 216 raise ValueError(
217 ('`MessagePassing.propagate` only supports integer tensors of '
218 'shape `[2, num_messages]`, `torch_sparse.SparseTensor` or '
219 '`torch.sparse.Tensor` for argument `edge_index`.'))
ValueError: `MessagePassing.propagate` only supports integer tensors of shape `[2, num_messages]`, `torch_sparse.SparseTensor` or `torch.sparse.Tensor` for argument `edge_index`.
Hi, it is very helpful that you provided relevant datasets.
However, there is one thing I am concerned about. Do your benchmark dataset has clear relations of cliff molecules? In other words, can we know exactly which pair of molecules have close graph structures but significantly different properties?
Thanks,
Hi, thanks for sharing the code.
However, as far as I am concerned, you only split the data into training and testing, but ignore the validation split.
It is important to have both train and validation, otherwise, you will not have enough knowledge to know when to stop training and what is the best model. I believe cross-validation cannot avoid the drawback of missing the validation split.
Can you please give me a potential answer?
Hey, I really appreciate your work - thank you very much for sharing the code and the data.
I found an inconsistency that I couldn't wrap my head around, and would like to ask you to clarify directly:
When looking at the data here:
https://github.com/molML/MoleculeACE/blob/main/MoleculeACE/Data/benchmark_data/CHEMBL2147_Ki.csv
the file has a column called "exp_mean [nM]", and a "y" column which should be the -log10(exp_mean), according to visual inspection and to what you wrote in the paper: "The mean Ki or EC50 value for each molecule was computed and subsequently converted into pEC50/pKi values (as the negative logarithm of molar concentrations)"
However, there is an issue: Smiles with the same value of "exp_mean" (e.g. of 100 nM) have "y" values that are either positive or negative (e.g. 2 or -2 in the example below), and I haven't found any way to make sense of this!
smiles | exp_mean [nM] | y |
---|---|---|
Cc1cncc(-c2cc3c(-c4cccc(N5CCNCC5)n4)n[nH]c3cn2)n1 | 100 | 2 |
Cc1ccc(F)c(-c2nc(C(=O)Nc3cnn(C)c3N3CCCC@@HCC3)c(N)s2)c1F | 100 | 2 |
Cn1ncc(NC(=O)c2nc(-c3ccccc3F)sc2N)c1N1CCC@HCC(F)(F)C1 | 100 | 2 |
Nc1sc(-c2c(F)cccc2F)nc1C(=O)Nc1cnn(C2CC2)c1N1CCC@HCC(F)(F)C1 | 100 | 2 |
C=C(C)c1ccc(-c2n[nH]c3cnc(-c4cccnc4)cc23)nc1N1CCCC@HC1 | 100 | 2 |
C#Cc1ccc(-c2n[nH]c3cnc(-c4cccnc4)cc23)nc1N1CCCC@HC1 | 100 | 2 |
Cn1ncc(NC(=O)c2nc(-c3ccc(C(F)(F)F)cc3F)sc2N)c1[C@@h]1CCC@@HC@@HCO1 | 100 | 2 |
CO[C@H]1COC@HCC[C@H]1N | 100 | 2 |
Cn1ncc(NC(=O)c2csc(-c3c(F)cc(C4(F)COC4)cc3F)n2)c1[C@@h]1CCC@@HC@HCO1 | 100 | 2 |
Nc1sc(-c2c(F)cccc2F)nc1C(=O)Nc1cnccc1N1CCCC@HC1 | 100 | 2 |
CN1CCC(N(C)c2ccc3nnc(-c4cccc(C(F)(F)F)c4)n3n2)CC1 | 100 | -2 |
Cn1c2ccccc2c2c3c(c4c5ccccc5n(CCC#N)c4c21)CNC3=O | 100 | -2 |
c1ccc(CNc2cc(-c3c[nH]c4ncccc34)ncn2)cc1 | 100 | -2 |
CSc1ccc2nc3c(c(Cl)c2c1)CCNC3=O | 100 | -2 |
Cc1n[nH]c2ccc(-c3cncc(OCC(N)Cc4ccccc4)c3)cc12 | 100 | -2 |
O=c1[nH]c2sc3c(c2c2nc(-c4ccccc4)nn12)CCCC3 | 100 | -2 |
O=C1NC(=O)C(c2c[nH]c3ccccc23)=C1c1nc(N2CCNCC2)nc2ccccc12 | 100 | -2 |
Could you please clarify what is the origin of this inconsistency?
Thank you!
Hi,
I was trying to run the first "Getting started" example in README.md and I ran into a problem executing the line
model = algorithm(hyperparameters)
It looks like the hyperparameters are not compatible.
Thanks in advance for your help
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 algorithm(hyperparameters)
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/MoleculeACE/models/mpnn.py:28, in MPNN.__init__(self, node_in_feats, node_hidden, edge_in_feats, edge_hidden, message_steps, dropout, transformer_heads, transformer_hidden, seed, fc_hidden, n_fc_layers, lr, epochs, *args, **kwargs)
22 def __init__(self, node_in_feats: int = 37, node_hidden: int = 64, edge_in_feats: int = 6,
23 edge_hidden: int = 128, message_steps: int = 3, dropout: float = 0.2,
24 transformer_heads: int = 8, transformer_hidden: int = 128, seed: int = RANDOM_SEED,
25 fc_hidden: int = 64, n_fc_layers: int = 1, lr: float = 0.0005, epochs: int = 300, *args, **kwargs):
26 super().__init__()
---> 28 self.model = MPNNmodel(node_in_feats=node_in_feats, node_hidden=node_hidden, edge_in_feats=edge_in_feats,
29 edge_hidden=edge_hidden, message_steps=message_steps, dropout=dropout,
30 transformer_heads=transformer_heads, transformer_hidden=transformer_hidden, seed=seed,
31 fc_hidden=fc_hidden, n_fc_layers=n_fc_layers)
33 self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
34 self.loss_fn = torch.nn.MSELoss()
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/MoleculeACE/models/mpnn.py:62, in MPNNmodel.__init__(self, node_in_feats, node_hidden, edge_in_feats, edge_hidden, message_steps, dropout, transformer_heads, transformer_hidden, seed, fc_hidden, n_fc_layers, *args, **kwargs)
59 self.node_in_feats = node_in_feats
61 # Layer to project node features to hidden features
---> 62 self.project_node_feats = Sequential(Linear(node_in_feats, node_hidden), ReLU())
64 # The 'learnable message function'
65 edge_network = Sequential(Linear(edge_in_feats, edge_hidden), ReLU(),
66 Linear(edge_hidden, node_hidden * node_hidden))
File ~/anaconda3/envs/moleculeACE/lib/python3.8/site-packages/torch/nn/modules/linear.py:96, in Linear.__init__(self, in_features, out_features, bias, device, dtype)
94 self.in_features = in_features
95 self.out_features = out_features
---> 96 self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
97 if bias:
98 self.bias = Parameter(torch.empty(out_features, **factory_kwargs))
TypeError: empty(): argument 'size' must be tuple of SymInts, but found element of type dict at pos 2
In GNN, if early_stopping_patience is default: None,
utils.py line 66
if patience is not None and patience >= early_stopping_patience:
will raise
'>=' not supported between instances of 'int' and 'NoneType'
Small thing. The documentation suggests "git clone https://github.com/derekvantilborg/MoleculeACE" but obviously it is moved to molML now.
"
MoleculeACE/MoleculeACE/benchmark/cliffs.py
Line 120 in 024ef21
m[i, j] = 1- (levenshtein(smiles[i], smiles[j]) / max(len(smiles[i]), len(smiles[j])))
Hi
I want to ask if there is an issue with this line of code? Why are we using sigmoid function after obtaining the atomic mass features?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.