Comments (18)
In row 3 of the table, is there a typo in the version?
from qcengine.
Thanks for starting this discussion, @dotsdl.
@dgasmith brought up this Issue this morning, and we had a productive conversation. Basically, we thought a scheme like this would be good starting point for discussion
Use examples
Desired force field | Program | Method | Basis | Keywords |
---|---|---|---|---|
openff-1.0.0 | openforcefield | parsley | 1.0.0 | constraints:true |
openff_unconstrained-1.0.0 | openforcefield | parsley | 1.0.0 | none |
Parsley with some simple bugfix (1.0.1) | openforcefield | parsley | 1.0.1 | none |
Sage | openforcefield | sage | 1.1.0 | None |
Sage | openforcefield | sage | Latest? | None |
Rdkit UFF | rdkit | uff | None | None |
AMBER GAFF | ambertools | gaff | 2.11 | {ES/NB cutoff, etc} |
The "openforcefield" program string would be used to mean "the openforcefield toolkit and all dependencies, including OpenMM and openforcefields (the conda package containing Parsley)"
The "constraints" keyword
Parsley's technical (actual) name is "openff-1.0.0.offxml", and it is intended for long-timestep MD. There is another "Parsley" that is intended for small molecule operations, which differs only by the removal of hydrogen bond length constraints, called `openff_unconstrainted-1.0.0.offxml". We understand that this is confusing, and it's caused by the fact that OpenFF and OpenMM consider the hbond constraints to be part of the FF (since it affects system energy) while most other comp chem packages don't. It's very confusing for everyone involved, but we have to do it this way unless we rewrite large portions of ParmEd.
We basically came to the conclusion that, if someone is running Parsley in QCEngine, they probably don't want hbond length constraints, so the default should be to use the unconstrained
file, however the constrained one could be requested by adding keywords (like hbond_constraints: true
FF codenames and semantic version numbers
Another issue was the fact that Parsley is anticipated to be the whole 1.0.Z series of FFs. Since the main purpose of OpenFF is to show incremental, largely automated improvement in FFs, we left ourselves wiggle room to have "bugfix" version increments within the same FF name. Currently there is no openff-1.0.1
, but there easily could be in the future. This has two consequences:
- if we have "basis" be the X.Y.Z version, there will need to be some validation that occurs to ensure that "Parsley" lines up with "1.0.Z", and "Sage" lines up with "1.1.Z"
- there is some danger in allowing
Latest
as a basis, since ParsleyLatest
could refer to different FFs on different days. We could consider these to be "significant" changes. For this reason, we think we should forbid "Latest" basis values on the global QCArchive, while QCEngine could support it for local runs.
from qcengine.
Thanks, @j-wags for the detailed post on the topic.
After thinking about the proposed scheme from the early discussion I would say that it looks good but would suggest some minor tweaks. I mainly think that the method should imply the X.Y
of the version number (for example parsley-> 1.0.Z sage-> 1.1.Z ) and basis should just cover the minor bump number. In this case, I would not allow "latest" as an option to keep consistent with the rest of QCEngine where a basis should be defined or None
. We could possibly allow None to default to Z=0
version of the force field which implies you just want initial parsley or sage with no extras.
from qcengine.
As a note we are considering the program=openforcefield
rather than program=openmm
for the following reasons:
- QCEngine wants to know a priori if it can run a task or not which has been chosen to have a single boolean at the program found level for simplicity. Since OpenMM may have many programs (or none) to type and generate the System object (openforcefield, RDKit, Antechamber, etc) this becomes highly parametrizable and the ability to run a task now depends on not just the program, but the route inside the harness the task will take.
- It simplifies the complexities of running MM computations. MM in QCEngine is always going to be a bit awkward, at the end of the day I think we will have two users here 1) I am a QC person and just want to run a forcefield (really any force field) to clean up my molecule. 2) I want to benchmark my forcefield within a QC framework.
program=openforcefield
would rely on the OpenMMHarness and would lock in System generation based on its own best practices and thus have a concrete dependancy tree.
Point 1) Is really a blocker from a Fractal usage point of view. This is solvable, but would take some moderate engineering time.
from qcengine.
This splitting sounds pretty complicated, since we have to hard-code a translation table that allows us to go from a force field name to the JSON entries and vice-versa in order to be able to interpret anything. Something as simple as querying for "show me data with openff-1.0.0
" becomes quite complicated and difficult to search for since complex queries are required.
I was going to propose we shift to using openmm-forcefields
under the hood to shift the burden if resolving how to parameterize small molecules to the OpenMM devs, since we can easily pick up GAFF 1/2 support this way and add more support in the future through the SystemGenerator
mechanism, which provides a mechanism for querying which force fields via SystemGenerator.SMALL_MOLECULE_FORCEFIELDS
. However, this would only report those that are potentially available---what is actually able to be calculated will depend on what the qcfractal workers have installed.
The drawback is that there are a couple of things that have to happen before everything is released on conda, so we can continue to use the current proposed method as long as we standardize the program/method/basis/keywords choices.
from qcengine.
I don't quite understand, splitting would make this as easy if not easier. program='openforcefield'
, method='parsley'
, method=
parsley, basis='1.0.2'
would all return relevant queries.
Using openmm-forcefields
as a program where it depends on other libraries installed is going to be a nightmare from a Fractal point of view (think hf3c errors, but more systemic).
from qcengine.
I don't quite understand, splitting would make this as easy if not easier. program='openforcefield', method='parsley', method=parsley, basis='1.0.2' would all return relevant queries.
Here's why this is confusing: The way this force field is instantiated in Python is:
from openforcefield.typing.engines.smirnoff import ForceField
forcefield = ForceField('openff_unconstrained-1.0.0.offxml')
Or, if using the constrained version (program='openforcefield', method='parsley', method=parsley, basis='1.0.2', keywords=['constraints:true']
),
from openforcefield.typing.engines.smirnoff import ForceField
forcefield = ForceField('openff-1.0.0.offxml')
The mapping table to translate back and forth between the filename the force field refers to and the query you must use to find results for it (or, equivalently, the information contained in the JSON file) is so complicated that even our lead software engineer made at least one mistake in constructing it.
Unless I'm missing something that would easily automate the translation back and forth between the string used to invoke the force field through ForceField
and the rendering of this in QCSchema, the proposed scheme is just too complicated to be practical.
Edit: This is so complicated I also made a mistake in the first version of this post.
from qcengine.
Using openmm-forcefields as a program where it depends on other libraries installed is going to be a nightmare from a Fractal point of view (think hf3c errors, but more systemic).
One other practical point here: The dependencies that openmm-forcefield
requires still need to be installed in order to compute the energies on the QCFractal client, so we're going to run into this issue no matter what. Perhaps we need some way for QCFractal clients to advertise their capabilities?
from qcengine.
@jchodera Fair enough, can you build a table similar to @j-wags so that we can get something concrete to look at implementing here?
from qcengine.
The most important idea is to make sure we keep the prefixes whole (e.g. smirnoff99Frosst-1.1.0
, openff-1.0.0
) so that it is unambiguous how to both find the data and identify the force field that generated it.
Constraints are specified unambiguously in the .offxml
file (formed by prefix + '.offxml'
for SMIRNOFF force fields).
Keywords can instead reflect any special keywords that are used to instantiate system construction for GAFF force fields (added when we switch to openmm-forcefields
in a later PR).
The "Program" can refer to the program that actually computed the energy.
The "Method" can refer to the typing engine that provided the parameters (e.g. smirnoff
vs antechamber
)
The "Basis" can refer to the force field specification string that uniquely identifies the file feeding the "Method" (e.g. openff-1.0.0
, smirnoff99Frosst-1.1.0
, gaff-1.81
, gaff-2.11
).
If we want, we can add "codename" to "keywords" for convenience, but these are just codenames.
Here's an example of how that would look when we expand support with openmm-forcefields
and RDKit
:
Desired force field | Program | Method | Basis | Keywords |
---|---|---|---|---|
smirnoff99Frosst-1.1.0 | openmm | smirnoff | smirnoff99Frosst-1.1.0 | none |
openff-1.0.0 | openmm | smirnoff | openff-1.0.0 | none |
openff_unconstrained-1.0.0 | openmm | smirnoff | openff_unconstrained-1.0.0 | none |
openff-1.0.1 | openmm | smirnoff | openff-1.0.1 | none |
openff-1.1.0 | openmm | smirnoff | openff-1.1.0 | none |
AMBER gaff-1.81 | openmm | antechamber | gaff-1.81 | {'constraints:HBonds', 'nonbondedMethod:NoCutoff'} |
AMBER gaff-2.11 | openmm | antechamber | gaff-2.11 | {'constraints:none', 'nonbondedMethod:NoCutoff'} |
RDKit UFF | rdkit | ForceField | UFF | none |
RDKit MMFF94 | rdkit | ForceField | MMFF94 | none |
RDKit MMFF94s | rdkit | ForceField | MMFF94s | none |
OpenEye MMFF94 | oechem | szybki | MMFF94 | none |
OpenEye MMFF94s | oechem | szybki | MMFF94s | none |
To me, this looks much simpler, and it also keeps the implementation simple and extensible (since there's a simple way to dispatch the basis
field to calculations under the hood.
from qcengine.
Seems structurally good. One item is to consider switching Method/Basis, something like UFF/GAFF make much more sense in the QC world as a method while something like the typing engine refers to the basis.
from qcengine.
Using openmm-forcefields as a program where it depends on other libraries installed is going to be a nightmare from a Fractal point of view (think hf3c errors, but more systemic).
One other practical point here: The dependencies that openmm-forcefield requires still need to be installed in order to compute the energies on the QCFractal client, so we're going to run into this issue no matter what. Perhaps we need some way for QCFractal clients to advertise their capabilities?
I'm not sure how relevant this would be, but it is possible to modify the found
method to additionally take method/driver (or AtomicInput
containing) and judge whether additional deps are available to satisfy that method. This wouldn't help with the broad available methods list from qcng but could help with a quick exit at runtime since found
is called at the start of compute
.
from qcengine.
Seems structurally good. One item is to consider switching Method/Basis, something like UFF/GAFF make much more sense in the QC world as a method while something like the typing engine refers to the basis.
I'm happy to defer to your preferences here! To me, the important principles are simply to keep the prefix needed to form the offxml file on the search path (e.g. smirnoff99Frosst-1.1.0
) or drive SystemGenerator
(e.g. gaff-1.81
) whole.
from qcengine.
@dotsdl @mattwelborn @j-wags @loriab
Any comments before I implement this?
from qcengine.
from qcengine.
This looks good to me. I think it's a good change to require the FF filename/prefix instead of the "media" name, Parsley.
from qcengine.
First pass through whole thread -- sorry I missed where I was pinged.
Proposal looks good (insofar as my limited MM usage extends). I think I agree with switching method/basis columns here. From the QM perspective, method is the guts and basis a detail, and if any is skipped or redundant it should be basis.
from qcengine.
I like where the proposal has landed. @dgasmith please let me know if you want help with implementation or review when the PR is up.
from qcengine.
Related Issues (20)
- Uses deprecated yaml.load()
- Tests look for the program mopac when the mopac package installs the program mopac7 HOT 3
- qcengine/programs/tests/test_programs.py::test_mopac_task runs for 200+ minutes HOT 1
- The test qcengine/programs/tests/test_programs.py::test_psi4_task runs forever
- add r2scan-3c and b97-3c to gcp harness
- handle_output_metadata deletes .extras values if they were returned on a FailedOperation object HOT 1
- Importing qcengine breaks OpenMPI programs HOT 2
- Inconsistent number of threads used by the geometry optimizers HOT 5
- 'psi4 --version' times out HOT 1
- AttributeError: module 'configparser' has no attribute 'SafeConfigParser' when building on Fedora rawhide HOT 2
- Psi4 `_handle_output` stops retries
- Torsiondrive procedure still uses `local_options` HOT 1
- ncores and jobs_per_node
- openmm and pint 0.22 incompatible
- How to get CCSD(T) dipoles via qcengine? HOT 2
- enable rigid optimization with geometric HOT 1
- ESP grid files from psi4. HOT 2
- 2D torsion scan
- The output dictionary from TeraChem does not have the 'return_energy' key in the properties. HOT 1
- Incorrect basis set capitalization in CFOUR harness HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qcengine.