I'm trying to understand your code but I have a few doubts.
First of all I'd like to understand how you generate the r-radius subgraphs. For instance, if I run your code, the first SMILES is 'CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)=[O+]2', which corresponds to the molecule
vector (or fingerprint) [ 6, 7, 8, 9, 10, 9, 8, 7, 6, 11, 9, 8, 7, 6, 11, 8, 7, 6, 9, 12, 12, 12, 13, 13, 13, 13, 12, 12, 12, 13, 13, 13, 13, 12, 12, 12, 13, 13, 13, 13, 12, 12, 12]
. I'm not able to understand how to obtain this vector. I though you were just looking for all the sub-graphs within a certain radius from each atom of the molecule. In this case I would expect something like:
[0, 1, 2] [0, 1, 2, 3, 9] [0, 1, 2, 3, 4, 6, 9] [1, 2, 3, 4, 5, 9, 10, 18] [2, 3, 4, 5, 6, 10, 11, 15, 18] [3, 4, 5, 6, 7, 9, 10, 18] [2, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9] [8, 6, 7] [1, 2, 3, 5, 6, 7, 9] [3, 4, 5, 10, 11, 12, 14, 18] [4, 10, 11, 12, 13, 14, 15] [10, 11, 12, 13, 14] [11, 12, 13] [10, 11, 12, 14, 15, 16, 18] [4, 11, 14, 15, 16, 17, 18] [14, 15, 16, 17, 18] [16, 17, 15] [3, 4, 5, 10, 14, 15, 16, 18]
Moreover, I do not understand why the second entry in molecules
does not start at zero anymore: [20, 21, 22, 23, 24, 24, 24, 23, 25, 9, 10, 9, 25, 20, 21, 22, 23, 24, 24, 24, 23, 26, 27, 28, 23, 24, 24, 24, 23, 29, 29, 27, 28, 23, 24, 24, 24, 23, 26, 30, 31, 32, 32, 32, 32, 32, 30, 31, 32, 32, 32, 32, 32, 13, 13, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 13, 13]
.
Finally, I'd like to understand if you concatenate all the r-radius subgraphs into a single vector (one of the vectors above). Thank you.