Code Monkey home page Code Monkey logo

Comments (8)

isayev avatar isayev commented on June 7, 2024 1

JSON is nice, I will have a look. Hopefully not like in XKCD comic

from ani1_dataset.

isayev avatar isayev commented on June 7, 2024

@jchodera John, sorry for a sluggish response. We have molecular topologies as SMILES strings. Let me find them for you.

from ani1_dataset.

jchodera avatar jchodera commented on June 7, 2024

The SMILES strings are still not quite enough to uniquely identify which atom indices go with which atoms in the molecular topology. Did you at least use a deterministic piece of code to go from SMILES -> unique atom ordering?

from ani1_dataset.

jchodera avatar jchodera commented on June 7, 2024

We actually had a timely discussion with @dgasmith this weekend about how we might better facilitate interoperability between quantum chemistry and molecular mechanics topology representations, especially in light of the new JSON schema being developed for quantum chemistry.

from ani1_dataset.

isayev avatar isayev commented on June 7, 2024

Ugh... true. We would need to ask in-house Jedi master @Jussmith01 for that.

from ani1_dataset.

jchodera avatar jchodera commented on June 7, 2024

Ugh... true. We would need to ask in-house Jedi master @Jussmith01 for that.

SMILES and a short piece of code to reproducibly generate the molecular topology would be sufficient, but it would be much more robust to just have a big multi-molecule mol2 or SDF tarball that has the same database keys since this would guard against changes to upstream codes (like RDKit) that change atom ordering.

from ani1_dataset.

roitberg avatar roitberg commented on June 7, 2024

Hi John,
We can certainly go xyz --> mol2, but I am not sure the bond orders, etc will be there. I am also slightly worried about the following. Take molecule i, for which we have N ‘conformations’. Since we are doing some pretty serious normal modes displacements for sampling, one can imagine conformations having different bond orders according to whatever algorithm one uses to create the mol2 file. This is either good news (since ir is possible that stretching a bond can give you a change in bond order) or bad news (if somehow you will use this data assuming the same bond orders for all conformers).

from ani1_dataset.

jchodera avatar jchodera commented on June 7, 2024

We can certainly go xyz --> mol2, but I am not sure the bond orders, etc will be there.

In the RDKit stage of your processing, these molecules must have a well-defined set of bond orders and topology---otherwise, RDKit would not have been able to process them. That representation should be sufficient to write out as mol2 or SDF format.

You are certainly correct that the subsequent perturbations might distort the bond orders or even perceived chemical connectivity! It may be possible for us to effectively deal with this through the computation of bond orders (e.g. Wiberg bond orders), though I'm not sure we could afford to do the same level of theory to evaluate this that you've done.

Even despite the chemical distortion issue, I think it would be super useful if the provenance information for what chemical topology these structures originated from (via mol2/SDF) was available.

P.S. Happy New Year!

from ani1_dataset.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.