Comments (8)
JSON is nice, I will have a look. Hopefully not like in XKCD comic
from ani1_dataset.
@jchodera John, sorry for a sluggish response. We have molecular topologies as SMILES strings. Let me find them for you.
from ani1_dataset.
The SMILES strings are still not quite enough to uniquely identify which atom indices go with which atoms in the molecular topology. Did you at least use a deterministic piece of code to go from SMILES -> unique atom ordering?
from ani1_dataset.
We actually had a timely discussion with @dgasmith this weekend about how we might better facilitate interoperability between quantum chemistry and molecular mechanics topology representations, especially in light of the new JSON schema being developed for quantum chemistry.
from ani1_dataset.
Ugh... true. We would need to ask in-house Jedi master @Jussmith01 for that.
from ani1_dataset.
Ugh... true. We would need to ask in-house Jedi master @Jussmith01 for that.
SMILES and a short piece of code to reproducibly generate the molecular topology would be sufficient, but it would be much more robust to just have a big multi-molecule mol2 or SDF tarball that has the same database keys since this would guard against changes to upstream codes (like RDKit) that change atom ordering.
from ani1_dataset.
Hi John,
We can certainly go xyz --> mol2, but I am not sure the bond orders, etc will be there. I am also slightly worried about the following. Take molecule i, for which we have N ‘conformations’. Since we are doing some pretty serious normal modes displacements for sampling, one can imagine conformations having different bond orders according to whatever algorithm one uses to create the mol2 file. This is either good news (since ir is possible that stretching a bond can give you a change in bond order) or bad news (if somehow you will use this data assuming the same bond orders for all conformers).
from ani1_dataset.
We can certainly go xyz --> mol2, but I am not sure the bond orders, etc will be there.
In the RDKit stage of your processing, these molecules must have a well-defined set of bond orders and topology---otherwise, RDKit would not have been able to process them. That representation should be sufficient to write out as mol2 or SDF format.
You are certainly correct that the subsequent perturbations might distort the bond orders or even perceived chemical connectivity! It may be possible for us to effectively deal with this through the computation of bond orders (e.g. Wiberg bond orders), though I'm not sure we could afford to do the same level of theory to evaluate this that you've done.
Even despite the chemical distortion issue, I think it would be super useful if the provenance information for what chemical topology these structures originated from (via mol2/SDF) was available.
P.S. Happy New Year!
from ani1_dataset.
Related Issues (10)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ani1_dataset.