During <a class="issue-link js-issue-link" data-error-text="Failed to load title" data

Thanks for starting this discussion, <a class="user-mention notranslate" data-hovercar

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Generate `basis` if not specified in OpenMMHarness,about molssi/qcengine

Comments (18)

mattwelborn commented on August 16, 2024 3

In row 3 of the table, is there a typo in the version?

from qcengine.

j-wags commented on August 16, 2024

Thanks for starting this discussion, @dotsdl.

@dgasmith brought up this Issue this morning, and we had a productive conversation. Basically, we thought a scheme like this would be good starting point for discussion

Use examples

Desired force field	Program	Method	Basis	Keywords
openff-1.0.0	openforcefield	parsley	1.0.0	constraints:true
openff_unconstrained-1.0.0	openforcefield	parsley	1.0.0	none
Parsley with some simple bugfix (1.0.1)	openforcefield	parsley	1.0.1	none
Sage	openforcefield	sage	1.1.0	None
Sage	openforcefield	sage	Latest?	None
Rdkit UFF	rdkit	uff	None	None
AMBER GAFF	ambertools	gaff	2.11	{ES/NB cutoff, etc}

The "openforcefield" program string would be used to mean "the openforcefield toolkit and all dependencies, including OpenMM and openforcefields (the conda package containing Parsley)"

The "constraints" keyword

Parsley's technical (actual) name is "openff-1.0.0.offxml", and it is intended for long-timestep MD. There is another "Parsley" that is intended for small molecule operations, which differs only by the removal of hydrogen bond length constraints, called `openff_unconstrainted-1.0.0.offxml". We understand that this is confusing, and it's caused by the fact that OpenFF and OpenMM consider the hbond constraints to be part of the FF (since it affects system energy) while most other comp chem packages don't. It's very confusing for everyone involved, but we have to do it this way unless we rewrite large portions of ParmEd.

We basically came to the conclusion that, if someone is running Parsley in QCEngine, they probably don't want hbond length constraints, so the default should be to use the unconstrained file, however the constrained one could be requested by adding keywords (like hbond_constraints: true

FF codenames and semantic version numbers

Another issue was the fact that Parsley is anticipated to be the whole 1.0.Z series of FFs. Since the main purpose of OpenFF is to show incremental, largely automated improvement in FFs, we left ourselves wiggle room to have "bugfix" version increments within the same FF name. Currently there is no openff-1.0.1, but there easily could be in the future. This has two consequences:

if we have "basis" be the X.Y.Z version, there will need to be some validation that occurs to ensure that "Parsley" lines up with "1.0.Z", and "Sage" lines up with "1.1.Z"
there is some danger in allowing Latest as a basis, since Parsley Latest could refer to different FFs on different days. We could consider these to be "significant" changes. For this reason, we think we should forbid "Latest" basis values on the global QCArchive, while QCEngine could support it for local runs.

from qcengine.

jthorton commented on August 16, 2024

Thanks, @j-wags for the detailed post on the topic.

After thinking about the proposed scheme from the early discussion I would say that it looks good but would suggest some minor tweaks. I mainly think that the method should imply the X.Y of the version number (for example parsley-> 1.0.Z sage-> 1.1.Z ) and basis should just cover the minor bump number. In this case, I would not allow "latest" as an option to keep consistent with the rest of QCEngine where a basis should be defined or None. We could possibly allow None to default to Z=0 version of the force field which implies you just want initial parsley or sage with no extras.

from qcengine.

dgasmith commented on August 16, 2024

As a note we are considering the program=openforcefield rather than program=openmm for the following reasons:

QCEngine wants to know a priori if it can run a task or not which has been chosen to have a single boolean at the program found level for simplicity. Since OpenMM may have many programs (or none) to type and generate the System object (openforcefield, RDKit, Antechamber, etc) this becomes highly parametrizable and the ability to run a task now depends on not just the program, but the route inside the harness the task will take.
It simplifies the complexities of running MM computations. MM in QCEngine is always going to be a bit awkward, at the end of the day I think we will have two users here 1) I am a QC person and just want to run a forcefield (really any force field) to clean up my molecule. 2) I want to benchmark my forcefield within a QC framework.
program=openforcefield would rely on the OpenMMHarness and would lock in System generation based on its own best practices and thus have a concrete dependancy tree.

Point 1) Is really a blocker from a Fractal usage point of view. This is solvable, but would take some moderate engineering time.

@loriab

from qcengine.

jchodera commented on August 16, 2024

This splitting sounds pretty complicated, since we have to hard-code a translation table that allows us to go from a force field name to the JSON entries and vice-versa in order to be able to interpret anything. Something as simple as querying for "show me data with openff-1.0.0" becomes quite complicated and difficult to search for since complex queries are required.

I was going to propose we shift to using openmm-forcefields under the hood to shift the burden if resolving how to parameterize small molecules to the OpenMM devs, since we can easily pick up GAFF 1/2 support this way and add more support in the future through the SystemGenerator mechanism, which provides a mechanism for querying which force fields via SystemGenerator.SMALL_MOLECULE_FORCEFIELDS. However, this would only report those that are potentially available---what is actually able to be calculated will depend on what the qcfractal workers have installed.

The drawback is that there are a couple of things that have to happen before everything is released on conda, so we can continue to use the current proposed method as long as we standardize the program/method/basis/keywords choices.

from qcengine.

dgasmith commented on August 16, 2024

I don't quite understand, splitting would make this as easy if not easier. program='openforcefield', method='parsley', method=parsley, basis='1.0.2' would all return relevant queries.

Using openmm-forcefields as a program where it depends on other libraries installed is going to be a nightmare from a Fractal point of view (think hf3c errors, but more systemic).

from qcengine.

jchodera commented on August 16, 2024

I don't quite understand, splitting would make this as easy if not easier. program='openforcefield', method='parsley', method=parsley, basis='1.0.2' would all return relevant queries.

Here's why this is confusing: The way this force field is instantiated in Python is:

from openforcefield.typing.engines.smirnoff import ForceField
forcefield = ForceField('openff_unconstrained-1.0.0.offxml')

Or, if using the constrained version (program='openforcefield', method='parsley', method=parsley, basis='1.0.2', keywords=['constraints:true']),

from openforcefield.typing.engines.smirnoff import ForceField
forcefield = ForceField('openff-1.0.0.offxml')

The mapping table to translate back and forth between the filename the force field refers to and the query you must use to find results for it (or, equivalently, the information contained in the JSON file) is so complicated that even our lead software engineer made at least one mistake in constructing it.

Unless I'm missing something that would easily automate the translation back and forth between the string used to invoke the force field through ForceField and the rendering of this in QCSchema, the proposed scheme is just too complicated to be practical.

Edit: This is so complicated I also made a mistake in the first version of this post.

from qcengine.

jchodera commented on August 16, 2024

Using openmm-forcefields as a program where it depends on other libraries installed is going to be a nightmare from a Fractal point of view (think hf3c errors, but more systemic).

One other practical point here: The dependencies that openmm-forcefield requires still need to be installed in order to compute the energies on the QCFractal client, so we're going to run into this issue no matter what. Perhaps we need some way for QCFractal clients to advertise their capabilities?

from qcengine.

dgasmith commented on August 16, 2024

@jchodera Fair enough, can you build a table similar to @j-wags so that we can get something concrete to look at implementing here?

from qcengine.

jchodera commented on August 16, 2024

The most important idea is to make sure we keep the prefixes whole (e.g. smirnoff99Frosst-1.1.0, openff-1.0.0) so that it is unambiguous how to both find the data and identify the force field that generated it.

Constraints are specified unambiguously in the .offxml file (formed by prefix + '.offxml' for SMIRNOFF force fields).

Keywords can instead reflect any special keywords that are used to instantiate system construction for GAFF force fields (added when we switch to openmm-forcefields in a later PR).

The "Program" can refer to the program that actually computed the energy.

The "Method" can refer to the typing engine that provided the parameters (e.g. smirnoff vs antechamber)

The "Basis" can refer to the force field specification string that uniquely identifies the file feeding the "Method" (e.g. openff-1.0.0, smirnoff99Frosst-1.1.0, gaff-1.81, gaff-2.11).

If we want, we can add "codename" to "keywords" for convenience, but these are just codenames.

Here's an example of how that would look when we expand support with openmm-forcefields and RDKit:

Desired force field	Program	Method	Basis	Keywords
smirnoff99Frosst-1.1.0	openmm	smirnoff	smirnoff99Frosst-1.1.0	none
openff-1.0.0	openmm	smirnoff	openff-1.0.0	none
openff_unconstrained-1.0.0	openmm	smirnoff	openff_unconstrained-1.0.0	none
openff-1.0.1	openmm	smirnoff	openff-1.0.1	none
openff-1.1.0	openmm	smirnoff	openff-1.1.0	none
AMBER gaff-1.81	openmm	antechamber	gaff-1.81	{'constraints:HBonds', 'nonbondedMethod:NoCutoff'}
AMBER gaff-2.11	openmm	antechamber	gaff-2.11	{'constraints:none', 'nonbondedMethod:NoCutoff'}
RDKit UFF	rdkit	ForceField	UFF	none
RDKit MMFF94	rdkit	ForceField	MMFF94	none
RDKit MMFF94s	rdkit	ForceField	MMFF94s	none
OpenEye MMFF94	oechem	szybki	MMFF94	none
OpenEye MMFF94s	oechem	szybki	MMFF94s	none

To me, this looks much simpler, and it also keeps the implementation simple and extensible (since there's a simple way to dispatch the basis field to calculations under the hood.

from qcengine.

dgasmith commented on August 16, 2024

Seems structurally good. One item is to consider switching Method/Basis, something like UFF/GAFF make much more sense in the QC world as a method while something like the typing engine refers to the basis.

from qcengine.

loriab commented on August 16, 2024

Using openmm-forcefields as a program where it depends on other libraries installed is going to be a nightmare from a Fractal point of view (think hf3c errors, but more systemic).

One other practical point here: The dependencies that openmm-forcefield requires still need to be installed in order to compute the energies on the QCFractal client, so we're going to run into this issue no matter what. Perhaps we need some way for QCFractal clients to advertise their capabilities?

I'm not sure how relevant this would be, but it is possible to modify the found method to additionally take method/driver (or AtomicInput containing) and judge whether additional deps are available to satisfy that method. This wouldn't help with the broad available methods list from qcng but could help with a quick exit at runtime since found is called at the start of compute.

from qcengine.

jchodera commented on August 16, 2024

Seems structurally good. One item is to consider switching Method/Basis, something like UFF/GAFF make much more sense in the QC world as a method while something like the typing engine refers to the basis.

I'm happy to defer to your preferences here! To me, the important principles are simply to keep the prefix needed to form the offxml file on the search path (e.g. smirnoff99Frosst-1.1.0) or drive SystemGenerator (e.g. gaff-1.81) whole.

from qcengine.

dgasmith commented on August 16, 2024

@dotsdl @mattwelborn @j-wags @loriab

Any comments before I implement this?

from qcengine.

mattwelborn commented on August 16, 2024

I’m happy with the current proposal

…

On Jan 29, 2020, at 13:44, Daniel Smith ***@***.***> wrote: @dotsdl @mattwelborn @j-wags @loriab Any comments before I implement this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

from qcengine.

j-wags commented on August 16, 2024

This looks good to me. I think it's a good change to require the FF filename/prefix instead of the "media" name, Parsley.

from qcengine.

loriab commented on August 16, 2024

First pass through whole thread -- sorry I missed where I was pinged.

Proposal looks good (insofar as my limited MM usage extends). I think I agree with switching method/basis columns here. From the QM perspective, method is the guts and basis a detail, and if any is skipped or redundant it should be basis.

from qcengine.

dotsdl commented on August 16, 2024

I like where the proposal has landed. @dgasmith please let me know if you want help with implementation or review when the PR is up.

from qcengine.

Generate `basis` if not specified in OpenMMHarness about qcengine HOT 18 OPEN

Comments (18)

Use examples

The "constraints" keyword

FF codenames and semantic version numbers

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent