openforcefield / openff-interchange Goto Github PK

View Code? Open in Web Editor NEW

64.0 22.0 22.0 5.55 MB

A project (and object) for storing, manipulating, and converting molecular mechanics data.

Home Page: https://docs.openforcefield.org/projects/interchange

License: MIT License

Python 100.00%

amber forcefield gromacs interoperability lammps molecular-simulation openmm

openff-interchange's Introduction

OpenFF Interchange

Test status
Code quality
Latest release
User support

A project (and object) for storing, manipulating, and converting molecular mechanics data.

Documentation

Documentation for Interchange, including examples, a brief user guide, release history, and API docs, is available on the OpenFF website. Example notebooks are rendered online among examples from other projects in the OpenFF ecosysytem docs

How to cite

Please cite Interchange using the Zenodo record of the latest release or the version that was used. The BibTeX reference of the latest release can be found here.

Installation

Recent versions of the OpenFF Toolkit (0.11.0+, released August 2022) install Interchange by default through its conda package.

Interchange can also be installed manually via conda (or mamba):

conda install openff-interchange -c conda-forge

Getting started

The Iterchange object serves primarily as a container object for parametrized data. It can currently take in SMIRNOFF or Foyer force fields and chemical topologies prepared via the OpenFF Toolkit. The resulting object stores parametrized data and provides APIs for export to common formats.

from openff.toolkit import ForceField, Molecule
from openff.units import unit

from openff.interchange import Interchange


# Use the OpenFF Toolkit to generate a molecule object from a SMILES pattern
molecule = Molecule.from_smiles("CCO")

# Generate a conformer to be used as atomic coordinates
molecule.generate_conformers(n_conformers=1)

# Convert this molecule to a topology
topology = molecule.to_topology()

# Define periodicity via box vectors
topology.box_vectors = unit.Quantity([4, 4, 4], unit.nanometer)

# Load OpenFF 2.0.0 "Sage"
sage = ForceField("openff-2.0.0.offxml")

# Create an Interchange object
out = Interchange.from_smirnoff(force_field=sage, topology=topology)

# Convert the Interchnage object to an OpenMM System
system = out.to_openmm()

# or write to GROMACS files
out.to_gro("out.gro")
out.to_top("out.top")

# or store as JSON
json_blob = out.json()

For more information, please consult the full documentation.

For more examples specific to Interchange, navigate to the examples/ directory.

Please note that this software in an early and experimental state without a stable API or guarantees of long-term stability.

Copyright

License

The source code of Interchange is hosted on GitHub and is available under the MIT license (see the file LICENSE). Some parts inherit from code distributed under other licenses, as detailed in LICENSE-3RD-PARTY.

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.2.

openff-interchange's People

Contributors

Stargazers

Watchers

openff-interchange's Issues

Non-uniqueness of interpolated parameters

In SMIRNOFF world, given a topology and force field that have already been through the typing step, I've been thinking that there's a 1:1:1 mapping between topology objects (i.e. tuple representing a Bond), SMIRNOFF parameters (i.e. BondType), and parametrized data (i.e. some Potential object, or openmm.Force object). This chain of key-val pairs is not so clear for interpolated parameters (currently proper torsions and harmonic bonds), in which one SMIRNOFF parameter maps onto arbitrarily many parametrized data.

For example, here are a couple lines for a test force field used in the toolkit:

     <Bond smirks="[#6X4:1]-[#6X3:2]" id="b2" k="634.0 * kilocalories_per_mole/angstrom**2" length="1.51 * angstrom     "/>
     <Bond smirks="[#6X3:1]-[#6X3:2]" id="b3" k_bondorder1="100.0*kilocalories_per_mole/angstrom**2" k_bondorder2="4000.0*kilocalories_per_mole/angstrom**2" length="1.5 * angstrom"/>

If 100 bonds in a topology are assigned to have bond b2, they will always be representing by those two physical quantities, no other information needed, and always the same data. But if a collection of 100 bonds are assigned bond b3, each bond may have slightly lengths (and, often, force constants) depending on the partial bond order found in the topology and/or used in the parametrization process.

Some ideas that come to mind

Enforce uniqueness by replacing the SMIRKS key with a key that tacks on the bond order, data from which the parametrized potential could be re-constructed.
Bypass de-duplication altogether

Units: Prototype of typed array model

This prototype is mostly a repackaging of others' work in a few different places (thanks @dgasmith and several in pydantic/pydantic#380. It serves as one way that array-like data can be stored with tagged units, constructed via reasonable attempts at processing imperfect inputs, and look reasonable to the user. Serialization is not yet implemented (aside from free model.dict() stuff, i.e. JSON doesn't work).

https://gist.github.com/mattwthompson/3a7d9959d20d7bbb10b06c894c5e5acc

Related #39

Tests: Nonbonded energies

@mrshirts says: Use a hard (simple, discontinuous) cut-off for electrostatics if it's implemented in the engine. Periodicity doesn't matter, if that's an issue with engines' implementations. InterMol did this for its testing. PME is messy, avoid that if possible for testing.

Interop: Support for non-LJ potentials

After exploring a little bit of the options out there, I think the simplest starting point is just implementing the original Buckingham parameters (here) not because they are groundbreaking but because it's small and bypasses the need for me to write up more complicated force fields (i.e. valence terms dozens of atom types, and UA models).

Amber seems to not support this potential, but GROMACS, OpenMM, and LAMMPS almost certainly each do.

Later I would like to support other non-LJ potentials and something to plug into OpenMM's CustomNonbondedForce.

Replace SMIRNOFFTerm into a more generic base class and inherit from it

The current implementation of the SMIRNOFFTerm class has many shortcomings. It should be separated out into a base class (perhaps PotentialHandler) that specifies a more generic representation, and there should be a subclass that adds in SMIRNOFF specifics. This, not the individual potential, is where the expression should be specified.

Enforcing order of potential handlers

The current pattern in the toolkit is to populate forces by iterating through ParameterHandler objects and calling the create_force method of each. For exporting to OpenMM, this requires some custom code that, among other things, enforces order so that some handlers are only ever called after other ones. I believe, for example, ChargeIncrementModelHandler can only ever apply charge increment after AM1-BCC charges have been applied by ToolkitAM1BCC. This is done by tagging each class with a ._DEPENDENCY attribute that specifies what must come before each, resolving that graph, and using it to specify the other they'll be iterated through.

Maintaining this behavior is probably unavoidable, so the question if there's a better way to implement it or not.

SMIRNOFF: Support charge increments

This is one of the last charge handlers in the toolkits that needs to receive some basic support. Once it does, I think that the to_smirnoff function could be a way to sync up the BCCs from openff-recharge.

Round-tripping sympy.Expr to str

I'd like to be able to digest strings, use sympy to do some expression validation, and convert them back to strings for actual storage. It's not clear how to do this - despite it being obvious, I can't figure out from the docs how to

>>> import sympy
>>> expr = sympy.Expr('x+1')
>>> expr
Expr(x+1)
>>> str(expr)
'Expr(x+1)'
>>> eval(expr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: eval() arg 1 must be a string, bytes or code object
>>> # I just want to get "x+1" back

Related #4

PotentialKey id should be allowed to be a tuple of integers.

Related to #114. Currently, I store the Bond, Angle and Dihedral parameters based on atom_indices. It might be nice to be able to support the PotentialKey type to be a tuple of atom indices.

Interop: Documenting non-bonded methods

Work in progress

English	Internal	SMIRNOFF	OpenMM	GROMACS
Electrostatics
PME	`off_sys.handlers["Electrostatics"].method == "PME"`	`<Electrostatics ... method="PME"></Electrostatics>`	`openmm.NonbondedForce.PME`	`coulombtype = PME`
Cut-off	`off_sys.handlers["Electrostatics"].method == "PME"`	`<Electrostatics ... method="Coulomb"></Electrostatics>`	`openmm.NonbondedForce.CutoffPeriodic`, `setReactionFieldDielectric(1.0)`	`coulombtype = cut-off; epsilon-r = 1`
vdW

Tag #103

Tests: Port some existing tests from OpenFF Toolkit

Work in progress - I need to add more to this list.

I'm going through existing tests (mostly test_forcefield.py, I think?) and tracking what behavior should be mimicked here. This issue serves as checklist for which tests I'd like to cover, at least as part of a first pass. I'm mostly focused on the details of parameter assignment and resolve nonbonded/charge methods.

test_charges_from_molecule: Ensure charge generation is skipped, and charges read from Molecule objects, when specified
test_nonintegral_charge_exception: Ensure each molecule has an integral charge (and raise an exception when not)
test_some_charges_from_molecule: Mix charge assignment and using some pre-specified charges
test_library_charges_monatomic_ions: Assign integer charges to monoatomic ions via library charges
test_charge_method_hierarchy: Ensure the hierarchy of charge methods (in SMIRNOFF spec?) is respected
test_parameterize_ethanol_missing_torsion: Ensure that an exception is raised when torsion(s) are left un-parametrized
test_parameterize_mol_missing_stereo_{rdkit|openeye}: Ensure that a molecule missing stereochemistry can be parametrized (by a force field that does not rely on stereochemistry)
test_nonbonded_cutoff_no_box_vectors: Ensure that the NonbondedForce objects use the cutoff specified in the
ParameterHandler, not the OpenMM defaults
test_nonbonded_method_resolution: The big, messy test covering a large matrix of nonbonded methods

@j-wags you may have some insight here, in particular the extreme cases - tests that you think I must merge or others you think I should steer completely away from

Units: Pydantic compatibility

One of a handful of implementation ideas, at the level of getting unit-tagged arrays working well with Pydantic data models, is to use fields that store the unit alongside a unit-less array, instead of carrying the unit-tagged array everywhere. This has some advantages in serialization, can look/"feel" the same to the user, and may make going between raw and tagged representations easier.

I know somebody in the organization has written up some code that implements this. I don't have it on hand.

Dihedrals: Assign separate 1-4 scalings per dihdedral

Some use cases seem to allow each dihedral to have a different 1-4 scaling factor (see ParmEd/ParmEd#1136). This is in conflict with the typical design of having all dihedrals use the same value. Supporting this would require some significant refactoring. Might be worth it to handle this behavior in a separate (sub)class.

Tests: Increase resolution of energy tests

In particular, we should split off the non-bonded section into electrostatics and vdW - merging these two together is a pattern inherited from how OpenMM processes these forces. This makes the data less human-readable and loses data reported by other engines.

Residues broken in GROMACS export

The GROMACS writers don't properly match up the resides between the .gro and .top files, causing issues with multi-component systems. It could be as simple as processing the .topology.mdtop residues in _build_molecule_map

Design: De-duplicate parameterized data based on molecule isomorphism

This is already done in some topology models. It would require some overhead but potentially be a huge win for data compression while allowing for de-compression as needed (i.e. in an OpenMM conversion).

Custom exceptions

It would probably be useful to construct detailed and high-specific custom exceptions and use them over built-in exceptions that are often too broad.

More complex slot-parameter mapping

Revisiting a simple case of dict mapping between "slots" and parameter keys (see https://gist.github.com/mattwthompson/c603d4895d9eafcc37fe7ab8afcb64d8)

In [8]: bonds.slot_map
Out[8]:
{(0, 1): '[#6X4:1]-[#6X4:2]',
 (0, 3): '[#6X4:1]-[#1:2]',
 (0, 4): '[#6X4:1]-[#1:2]',
 (0, 5): '[#6X4:1]-[#1:2]',
 (1, 2): '[#6:1]-[#8:2]',
 (1, 6): '[#6X4:1]-[#1:2]',
 (1, 7): '[#6X4:1]-[#1:2]',
 (2, 8): '[#8:1]-[#1:2]',
 (9, 10): '[#6X4:1]-[#6X4:2]',
 (9, 11): '[#6X4:1]-[#1:2]',
 (9, 12): '[#6X4:1]-[#1:2]',
 (9, 13): '[#6X4:1]-[#1:2]',
 (10, 14): '[#6X4:1]-[#1:2]',
 (10, 15): '[#6X4:1]-[#1:2]',
 (10, 16): '[#6X4:1]-[#1:2]'}

This is straightforward enough for bonds, in which the slots are clearly defined, do not overlap with each other, and can only point to one parameter. One could also reshape this mapping into a matrix in which each slot is a row, each column is a parameter key (for SMIRNOFF, a SMIRKS pattern than uniquely identifies the parameter). This would look something like

Slot	`'[#6X4:1]-[#6X4:2]'`	`'[#6X4:1]-[#1:2]'`	`'[#6:1]-[#8:2]'`
(0, 1)	1	0	0
(0, 3)	0	1	0
(0, 4)	0	1	0
(0, 5)	0	1	0
(1, 2)	0	0	1

and so on. This could capture a few rules specific to the typing of bonds in SMIRNOFF, i.e. the sum of each row is exactly 1.

The matrix view provides can describe an alternative way of storing interpolated parameters. Ignoring the physics, imagine if slot (0, 1) was typed, via bond interpolation, to be partway (let's say 30%) between parameters '[#6X4:1]-[#6X4:2]' and '[#6:1]-[#8:2]'. This could be represented as a matrix like

Slot	`'[#6X4:1]-[#6X4:2]'`	`'[#6X4:1]-[#1:2]'`	`'[#6:1]-[#8:2]'`
(0, 1)	0.7	0	0.3
(0, 3)	0	1	0
(0, 4)	0	1	0
(0, 5)	0	1	0
(1, 2)	0	0	1

But this could not be mapped back to a simple dictionary. There needs to be a way, in general, to

Store multiple parameters can be applied to the same slot
Describe how they are to be applied (i.e. weightings, in this case)
Follow "rules" of the handler (i.e. in this case that each row must add up to exactly 1) or allow the rules to be bent (if these sums should be allow to be greater or less than 1)

Python 3.6 support

JAX officially supports 3.6+ but on conda-forge, it is hard-coded for 3.7+

Options

Soft import JAX
Install JAX from PyPI
Don't support Python 3.6

Serialization: JSON

While using implicit units in models (after #65), JSON serialization is closer to working out of the box. The only blocker is the current OFFTop model, but that should not be a long-term constraint (and it may be possible to get this working in the short term if removing conformer and box data).

Again, this minimal example has the major shortcomings of clearing the topology attribute and not storing any units. But those are expected to be supported behaviors.

This is after #67.

In [1]: from openforcefield.topology import Molecule, Topology

In [2]: from openff.system.stubs import ForceField

In [3]: parsley = ForceField("openff-1.0.0.offxml")

In [4]: top = Topology.from_molecules(2 * [Molecule.from_smiles("CCO")])

In [5]: off_sys = parsley.create_openff_system(top)

In [6]: off_sys.topology = None

In [7]: off_sys.json()
Out[7]: '{"handlers": {"Constraints": {"name": "Constraints", "expression": "", "independent_variables": [""], "slot_map": {"(0, 3)": "[#1:1]-[*:2]", "(0, 4)": "[#1:1]-[*:2]", "(0, 5)": "[#1:1]-[*:2]", "(1, 6)": "[#1:1]-[*:2]", "(1, 7)": "[#1:1]-[*:2]", "(2, 8)": "[#1:1]-[*:2]", "(9, 12)": "[#1:1]-[*:2]", "(9, 13)": "[#1:1]-[*:2]", "(9, 14)": "[#1:1]-[*:2]", "(10, 15)": "[#1:1]-[*:2]", "(10, 16)": "[#1:1]-[*:2]", "(11, 17)": "[#1:1]-[*:2]"}, "potentials": {}, "constraints": {"[#1:1]-[*:2]": true}}, "Bonds": {"name": "Bonds", "expression": "1/2 * k * (r - length) ** 2", "independent_variables": ["r"], "slot_map": {"(0, 1)": "[#6X4:1]-[#6X4:2]", "(0, 3)": "[#6X4:1]-[#1:2]", "(0, 4)": "[#6X4:1]-[#1:2]", "(0, 5)": "[#6X4:1]-[#1:2]", "(1, 2)": "[#6:1]-[#8:2]", "(1, 6)": "[#6X4:1]-[#1:2]", "(1, 7)": "[#6X4:1]-[#1:2]", "(2, 8)": "[#8:1]-[#1:2]", "(9, 10)": "[#6X4:1]-[#6X4:2]", "(9, 12)": "[#6X4:1]-[#1:2]", "(9, 13)": "[#6X4:1]-[#1:2]", "(9, 14)": "[#6X4:1]-[#1:2]", "(10, 11)": "[#6:1]-[#8:2]", "(10, 15)": "[#6X4:1]-[#1:2]", "(10, 16)": "[#6X4:1]-[#1:2]", "(11, 17)": "[#8:1]-[#1:2]"}, "potentials": {"[#6X4:1]-[#6X4:2]": {"parameters": {"k": 531.1373738609999, "length": 1.520375903275}}, "[#6X4:1]-[#1:2]": {"parameters": {"k": 758.0931772912999, "length": 1.092888378383}}, "[#6:1]-[#8:2]": {"parameters": {"k": 669.1415170137999, "length": 1.414287924481}}, "[#8:1]-[#1:2]": {"parameters": {"k": 1120.5832361189998, "length": 0.9707687589944}}}}, "Angles": {"name": "Angles", "expression": "1/2 * k * (angle - theta)", "independent_variables": ["theta"], "slot_map": {"(0, 1, 2)": "[*:1]~[#6X4:2]-[*:3]", "(0, 1, 6)": "[*:1]~[#6X4:2]-[*:3]", "(0, 1, 7)": "[*:1]~[#6X4:2]-[*:3]", "(1, 0, 3)": "[*:1]~[#6X4:2]-[*:3]", "(1, 0, 4)": "[*:1]~[#6X4:2]-[*:3]", "(1, 0, 5)": "[*:1]~[#6X4:2]-[*:3]", "(1, 2, 8)": "[*:1]-[#8:2]-[*:3]", "(2, 1, 6)": "[*:1]~[#6X4:2]-[*:3]", "(2, 1, 7)": "[*:1]~[#6X4:2]-[*:3]", "(3, 0, 4)": "[#1:1]-[#6X4:2]-[#1:3]", "(3, 0, 5)": "[#1:1]-[#6X4:2]-[#1:3]", "(4, 0, 5)": "[#1:1]-[#6X4:2]-[#1:3]", "(6, 1, 7)": "[#1:1]-[#6X4:2]-[#1:3]", "(9, 10, 11)": "[*:1]~[#6X4:2]-[*:3]", "(9, 10, 15)": "[*:1]~[#6X4:2]-[*:3]", "(9, 10, 16)": "[*:1]~[#6X4:2]-[*:3]", "(10, 9, 12)": "[*:1]~[#6X4:2]-[*:3]", "(10, 9, 13)": "[*:1]~[#6X4:2]-[*:3]", "(10, 9, 14)": "[*:1]~[#6X4:2]-[*:3]", "(10, 11, 17)": "[*:1]-[#8:2]-[*:3]", "(11, 10, 15)": "[*:1]~[#6X4:2]-[*:3]", "(11, 10, 16)": "[*:1]~[#6X4:2]-[*:3]", "(12, 9, 13)": "[#1:1]-[#6X4:2]-[#1:3]", "(12, 9, 14)": "[#1:1]-[#6X4:2]-[#1:3]", "(13, 9, 14)": "[#1:1]-[#6X4:2]-[#1:3]", "(15, 10, 16)": "[#1:1]-[#6X4:2]-[#1:3]"}, "potentials": {"[*:1]~[#6X4:2]-[*:3]": {"parameters": {"k": 101.73733623669997, "angle": 107.6607821752}}, "[*:1]-[#8:2]-[*:3]": {"parameters": {"k": 112.36488882409998, "angle": 110.2517797785}}, "[#1:1]-[#6X4:2]-[#1:3]": {"parameters": {"k": 74.28701527176999, "angle": 107.5991506326}}}}, "ProperTorsions": {"name": "ProperTorsions", "expression": "k*(1+cos(periodicity*theta-phase))", "independent_variables": ["theta"], "slot_map": {"(0, 1, 2, 8)_0": "[#6X4:1]-[#6X4:2]-[#8X2H1:3]-[#1:4]_0", "(0, 1, 2, 8)_1": "[#6X4:1]-[#6X4:2]-[#8X2H1:3]-[#1:4]_1", "(2, 1, 0, 3)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_0", "(2, 1, 0, 3)_1": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_1", "(2, 1, 0, 4)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_0", "(2, 1, 0, 4)_1": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_1", "(2, 1, 0, 5)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_0", "(2, 1, 0, 5)_1": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_1", "(3, 0, 1, 6)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(3, 0, 1, 7)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(4, 0, 1, 6)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(4, 0, 1, 7)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(5, 0, 1, 6)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(5, 0, 1, 7)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(6, 1, 2, 8)_0": "[*:1]-[#6X4:2]-[#8X2:3]-[#1:4]_0", "(7, 1, 2, 8)_0": "[*:1]-[#6X4:2]-[#8X2:3]-[#1:4]_0", "(9, 10, 11, 17)_0": "[#6X4:1]-[#6X4:2]-[#8X2H1:3]-[#1:4]_0", "(9, 10, 11, 17)_1": "[#6X4:1]-[#6X4:2]-[#8X2H1:3]-[#1:4]_1", "(11, 10, 9, 12)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_0", "(11, 10, 9, 12)_1": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_1", "(11, 10, 9, 13)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_0", "(11, 10, 9, 13)_1": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_1", "(11, 10, 9, 14)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_0", "(11, 10, 9, 14)_1": "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_1", "(12, 9, 10, 15)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(12, 9, 10, 16)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(13, 9, 10, 15)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(13, 9, 10, 16)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(14, 9, 10, 15)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(14, 9, 10, 16)_0": "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0", "(15, 10, 11, 17)_0": "[*:1]-[#6X4:2]-[#8X2:3]-[#1:4]_0", "(16, 10, 11, 17)_0": "[*:1]-[#6X4:2]-[#8X2:3]-[#1:4]_0"}, "potentials": {"[#6X4:1]-[#6X4:2]-[#8X2H1:3]-[#1:4]_0": {"parameters": {"k": 0.15597152892479998, "periodicity": 1.0, "phase": 0.0}}, "[#6X4:1]-[#6X4:2]-[#8X2H1:3]-[#1:4]_1": {"parameters": {"k": 0.15597152892479998, "periodicity": 1.0, "phase": 0.0}}, "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_0": {"parameters": {"k": -0.4828571351303999, "periodicity": 1.0, "phase": 0.0}}, "[#1:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]_1": {"parameters": {"k": -0.4828571351303999, "periodicity": 1.0, "phase": 0.0}}, "[#1:1]-[#6X4:2]-[#6X4:3]-[#1:4]_0": {"parameters": {"k": 0.10449057473829997, "periodicity": 3.0, "phase": 0.0}}, "[*:1]-[#6X4:2]-[#8X2:3]-[#1:4]_0": {"parameters": {"k": 0.6777540358859999, "periodicity": 3.0, "phase": 0.0}}}}, "vdW": {"name": "vdW", "expression": "4*epsilon*((sigma/r)**12-(sigma/r)**6)", "independent_variables": ["r"], "slot_map": {"(0,)": "[#6X4:1]", "(1,)": "[#6X4:1]", "(2,)": "[#8X2H1+0:1]", "(3,)": "[#1:1]-[#6X4]", "(4,)": "[#1:1]-[#6X4]", "(5,)": "[#1:1]-[#6X4]", "(6,)": "[#1:1]-[#6X4]-[#7,#8,#9,#16,#17,#35]", "(7,)": "[#1:1]-[#6X4]-[#7,#8,#9,#16,#17,#35]", "(8,)": "[#1:1]-[#8]", "(9,)": "[#6X4:1]", "(10,)": "[#6X4:1]", "(11,)": "[#8X2H1+0:1]", "(12,)": "[#1:1]-[#6X4]", "(13,)": "[#1:1]-[#6X4]", "(14,)": "[#1:1]-[#6X4]", "(15,)": "[#1:1]-[#6X4]-[#7,#8,#9,#16,#17,#35]", "(16,)": "[#1:1]-[#6X4]-[#7,#8,#9,#16,#17,#35]", "(17,)": "[#1:1]-[#8]"}, "potentials": {"[#6X4:1]": {"parameters": {"sigma": 1.6998347542117673, "epsilon": 0.10939999999999997}}, "[#8X2H1+0:1]": {"parameters": {"sigma": 1.5332366939195239, "epsilon": 0.21039999999999995}}, "[#1:1]-[#6X4]": {"parameters": {"sigma": 1.3247663938746845, "epsilon": 0.015699999999999995}}, "[#1:1]-[#6X4]-[#7,#8,#9,#16,#17,#35]": {"parameters": {"sigma": 1.2356765220606505, "epsilon": 0.015699999999999995}}, "[#1:1]-[#8]": {"parameters": {"sigma": 0.26726961544210176, "epsilon": 5.269999999999999e-05}}}, "scale_13": 0.0, "scale_14": 0.5, "scale_15": 1.0}, "Electrostatics": {"name": "Electrostatics", "expression": "coul", "independent_variables": ["r"], "charge_map": {"(0,)": -0.09709999710321426, "(1,)": 0.1314300000667572, "(2,)": -0.6013399958610535, "(3,)": 0.044759999960660934, "(4,)": 0.044759999960660934, "(5,)": 0.044759999960660934, "(6,)": 0.017319999635219574, "(7,)": 0.017319999635219574, "(8,)": 0.3980900049209595, "(9,)": -0.09709999710321426, "(10,)": 0.1314300000667572, "(11,)": -0.6013399958610535, "(12,)": 0.044759999960660934, "(13,)": 0.044759999960660934, "(14,)": 0.044759999960660934, "(15,)": 0.017319999635219574, "(16,)": 0.017319999635219574, "(17,)": 0.3980900049209595}, "scale_13": 0.0, "scale_14": 0.833333, "scale_15": 1.0}}, "topology": null, "box": null, "positions": null}'

Type hints/checking

It might be nice to use mypy https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html

Reference data for tests

It is not currently clear to me what should be used as a reference state for testing interoperability. There is currently no reliable way to get system energies in an arbitrary format, i.e. there is no reference set of GROMACS files that new development can be compared to. Maybe a workaround is to compare energies across engines, i.e. use OpenMM energies, from the OpenFF Toolkit, as the reference, even to engines besides OpenMM?

Example:

Sections of two .top files, written to different precisions, product (slightly) different energies

With the current internal writer (having copied precision from InterMol, which writes to 8 digits: https://github.com/shirtsgroup/InterMol/blob/dbbdd8207c1dc8218ba4b9f026746a88414a867d/intermol/gromacs/gromacs_parser.py#L644)

[ bonds ]                                                                                                           
      1       2 1        1.09288838e-01     3.17186185e+05    
      1       3 1        1.09288838e-01     3.17186185e+05    
      1       4 1        1.09288838e-01     3.17186185e+05                  
      1       5 1        1.09288838e-01     3.17186185e+05

as compared to the output of ParmEd (which uses different logic: https://github.com/ParmEd/ParmEd/blob/7152537b3478c67cb34f7a0a89d32a5397d60295/parmed/gromacs/gromacstop.py#L1819-L1820)

[ bonds ]                                                                                                           
      1      2     1   0.10929 317186.185379                  
      1      3     1   0.10929 317186.185379                  
      1      4     1   0.10929 317186.185379                                
      1      5     1   0.10929 317186.185379

These are numerically close, but they produce slightly different results when InterMol is used to evaluate the energies:

ipdb> pmd_energy
OrderedDict([('Bond', Quantity(value=0.092649519444, unit=kilojoule/mole)),  ...
ipdb> internal_energy
OrderedDict([('Bond', Quantity(value=0.092514954507, unit=kilojoule/mole)), ...

These may or may not matter in terms of an actual simulation, but they do fail the default arguments of np.isclose.

#90 (comment)

Parse SMIRNOFF data directly from dict

When building up a ForceField object, the toolkit internally stores the data as a dict. This is currently the only used as an interface between disk (XML) and memory, but may be expanded on in the future. It would be useful to optionally parse that dictionary directly if it contains everything needed to create the corresponding objects here.

Collapse topologically identical molecules in typing maps

The toolkit's Topology class has the concept of a TopologyMolecule which represents a molecule that occurs many times in a topology but is de-duplicated to save memory. We should get this for free as long as we can build directly off of the toolkit's Topology, but we should make sure to also have this behavior mirrored in things like "slot-parameter" maps, i.e. the same SMIRKS pattern is going to be applied to the oxygen in water and this doesn't need to be repeated many times for each water in the system.

GRO file precision

Need to look into how to make it more than 0.001 nm

Python 3.8 Support

I'm pretty sure that OpenMM is the major blocker here, and once its version 7.5 is released, the floodgates will open. The last I checked, some omnia packages were the only blockers.

Interop: Apply separate force fields to different portions of a topology

Use cases:

Apply a separate force field to a ligand and protein
In a mixture of organic liquids, apply separate force fields to either component

Note that applying separate force fields to any chemically bound components is outside the scope of an initial implementation, but a use case that should be covered in the future.

It's unclear to me at the moment if there should be one solution along the lines of system combination or two similar routes to the same result.

Interop: Ryckaert-Bellemans torsions

Not currently in the SMIRNOFF spec, but can store with a new Potential Handler

Flow: Assigning box vectors

I'm starting a series of issues relating to how to best design the API for reasonable flows of data.

Box vectors are fundamental - you can't do virtually any condensed physics simulations without them - but tracking them can be clunky. They are often tracked as part of a topology-like object (OpenMM, ParmEd, mBuild, many file formats), sometimes as optional data. This seems reasonable and makes sense in any conception of a chemical topology, except for one that views it only as a chemical graph. If that is granted, it needs to be tracked separate, and we need to consider how a user would track it through a workflow and how to make the best user experience.

As a starting point, consider how it's currently passed through the toolkit:

omm_sys = forcefield.create_openmm_system(topology=my_top)

This method will look into the topology and, if it has box vectors, tack it into the OpenMM system. If we divorce a topology from its box vectors, this method could look like this

omm_sys = forcefield.create_openmm_system(topology=my_top, box_vectors=mybox)
# similarly ...
off_sys = forcefield.create_openff_system(topology=my_top, box_vectors=mybox)

This does force the user to explicitly carry the box vectors through as a separate object. That's a downside. But it makes is extremely clear what box vectors the final system is going to be populated with, removing any ambiguity about if the topology contains box vectors and if they're going to be passed through to the final system. It also safeguards the possibility of needing to set box vectors after system creation - that sense, the extra argument isn't really extra work.

Branding: Rename to something less generic

The current name, OpenFF System, is descriptive and roughly captures the intent of the project. It is, however, fairly generic and could get confused with other objects and keywords, i.e. OpenMM System, import os; os.system('cmd') in Python, etc.

This may no longer be necessary with the import paths (from openff.system import ...) resulting from our unified namespace.

Storing slots as tuples breaks JSON serialization

Out[12]:
{'name': 'Bonds',
 'expression': '1/2 * k * (r - length) ** 2',
 'independent_variables': {'r'},
 'slot_map': {(0, 1): '[#6X4:1]-[#6X4:2]',
  (0, 3): '[#6X4:1]-[#1:2]',
  (0, 4): '[#6X4:1]-[#1:2]',
  (0, 5): '[#6X4:1]-[#1:2]',
  (1, 2): '[#6:1]-[#8:2]',
  (1, 6): '[#6X4:1]-[#1:2]',
  (1, 7): '[#6X4:1]-[#1:2]',
  (2, 8): '[#8:1]-[#1:2]',
  (9, 10): '[#6X4:1]-[#6X4:2]',
  (9, 12): '[#6X4:1]-[#1:2]',
  (9, 13): '[#6X4:1]-[#1:2]',
  (9, 14): '[#6X4:1]-[#1:2]',
  (10, 11): '[#6:1]-[#8:2]',
  (10, 15): '[#6X4:1]-[#1:2]',
  (10, 16): '[#6X4:1]-[#1:2]',
  (11, 17): '[#8:1]-[#1:2]'},
 'potentials': {'[#6X4:1]-[#6X4:2]': {'parameters': {'k': 517.2187207483 <Unit('kilocalorie / angstrom ** 2 / mole')>,
    'length': 1.520980132854 <Unit('angstrom')>}},
  '[#6X4:1]-[#1:2]': {'parameters': {'k': 754.0714751826 <Unit('kilocalorie / angstrom ** 2 / mole')>,
    'length': 1.093910524997 <Unit('angstrom')>}},
  '[#6:1]-[#8:2]': {'parameters': {'k': 664.3946131079 <Unit('kilocalorie / angstrom ** 2 / mole')>,
    'length': 1.429330393438 <Unit('angstrom')>}},
  '[#8:1]-[#1:2]': {'parameters': {'k': 1111.356329629 <Unit('kilocalorie / angstrom ** 2 / mole')>,
    'length': 0.9713231822139 <Unit('angstrom')>}}}}

In [13]: bonds.json()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-3451bae00247> in <module>
----> 1 bonds.json()

~/miniconda3/envs/openff-system/lib/python3.7/site-packages/pydantic/main.cpython-37m-darwin.so in pydantic.main.BaseModel.json()

~/miniconda3/envs/openff-system/lib/python3.7/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    236         check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    237         separators=separators, default=default, sort_keys=sort_keys,
--> 238         **kw).encode(obj)
    239
    240

~/miniconda3/envs/openff-system/lib/python3.7/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

~/miniconda3/envs/openff-system/lib/python3.7/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

TypeError: keys must be str, int, float, bool or None, not tuple

Seems like they should be cast to strings in order to be friendly to JSON, which has no concept of tuples. Custom classes with a simple string representation could be another approach.

Design: Expand functionality of PotentialKey

Two critical use cases:

Map a TopologyKey to a combination of potentials. This could involve a similarly-structured class that itself maps to individual keys with weighting coefficients, i.e. two torsion parameters in an interpolated torsions
#129 (comment)

Units: Third-party library

It really comes down to three options at the moment:

Interoperability: Espaloma

@yuanqing-wang has already developed round-trips between espaloma.Graph and openmm.System

These could serve as inspiration for parallel round-trips here.

"meta" handlers

The prototype of a PotentialHandler for bonds bypasses several issues, one of them being that parts of a force field are not always able to operate independently of each other. The most difficult example of this, for us, is electrostatics, in which one could think of a "meta" handler (ElectrostaticsMetaHandler, or something) that itself includes potentially several handlers (library charges, AM1-BCC, charge increments, etc.). (There may also be a meta handler for VDW terms, in which atoms and virtual sites are treated differently). This would provide one interface to the results of the child classes, i.e. a single place to get the charges that a force field would ultimately apply to the system.

It's unclear to me if this class should inherit from PotentialHandler or just be another instance of it. I learn toward the former, as a meta handler won't have individual parameters of its own, and therefore it doesn't make sense to expose methods that rely on that data.

SMIRNOFF: Handling torsions

Torsions are trickier to deal with than bonds or angles because a parameter can have an unlimited number of terms. (In practice, this seems to top out around 5, but the implementation should be pick arbitrarily nasty parameters by nature of an "N >= 2" treatment).

Just templating in dummy terms for all torsions, i.e. expanding all to ~5 or so terms and zeroing out the un-used values, would enable digesting the data but would be more expensive than necessary. It's also not straightforward to constrain the dummy zero values to zero so that an optimizer doesn't magically turn a single-term torsion into a double-term torsion.

Another route could involve making separate classes for each type of term - one for each number of terms a torsion could have. This would involve a lot of redundant code, might not enable tuning all torsions at once, and ultimately not be very flexible.

The implementation in timemachine separates out the multi-term torsions into individual terms seems to be the simplest. It requires some bookkeeping for going between each representation, but is a cleaner interface to the actual values. This seems to be the one to go with.

https://github.com/proteneer/timemachine/blob/e9cec4e1ddfb952d6615244d5fea6ae889c07c29/ff/handlers/bonded.py#L72

Lightweight library for parsing arbitrary anlytical function forms

GMSO uses sympy, a stable and popular symbolic math library, to parse analytical expressions of arbitrary functional forms (i.e. everything between non-LJ non-bonded terms or whatever wild torsion form you feel like drawing up). I think GMSO has worked well with it, but it is a fairly heavy dependency to pull in and it is worth looking into alternatives.

Functionality I think we care about

Parsing arbitrary analytical expressions into some useful internal representation
Comparing expressions and individual variables to ensure an expression is fully described with parameters
Equality-like comparisons between expressions for interfaces (see how it is done in GMSO here)

Dealing with partially/un-parametrized systems

Consider the current constructor:

class System(BaseModel):
    """The OpenFF System object."""

    topology: Union[Topology, OpenMMTopology]
    forcefield: Union[ForceField, ParameterHandler] = None
    slot_smirks_map: Dict = None
    smirks_potential_map: Dict = None
    term_collection: SMIRNOFFTermCollection = None
    positions: UnitArray
    box: UnitArray

This is effectively a topology, a handful of ways to deal with force field data and typing information, positions, and box vectors. It may seem counter-intuitive, but in order to facilitate #19 it may be valuable to support states with no parameters, in addition to those with some but not all.

# specify a topology, force field, positions, and box vectors

mysystem = System(topology, positions, box)  # Create an "empty" system

# Apply select ParameterHandler objects, instead of the entire force field all at once
mysystem.apply_parameter_handler(forcefield['vdW'])
mysystem.apply_parameter_handler(forcefield['Bonds'])
...

This may be feasible with a subclass from System or perhaps a common ancestor that specifies the common components. Or maybe composition over inheritance.

SMIRNOFF: Resolving nonbonded methods

Tag openforcefield/openff-toolkit#380

Partial application of a force field

I have a chemical topology and a force field and I want to get a parametrized system, out, but I only want to get the electrostatics, so don't bother with the other terms

Data: Where to store constraints?

Every piece of software seems to do this a little differently. In the scope of this project, let me start with a few ideas:

Store it on the topology, and add something like a boolean attribute .constrained in the parametrization step
Have a "potential handler" that just stores the implied intent of a constraint (at least the atom indices/bond index, and optionally the constraint length) and leave it to the interoperability layer to process that information correctly
Some combination of the above, possibly just duplicating the data
Store the constraint information inside bond potentials alongside the other data, making the force constant optional

Design: Using pre-defined data on input molecules

Some quantities relevant to force field parametrization (partial charges, partial bond orders, aromaticity, etc.) can be calculated on the fly with fully-defined openff.toolkit.topology.molecule.Molecule objects. Some/all of these can also be pre-defined beforehand, i.e. partial charges read from an SDF file. There's a paradigm in the toolkit of allowing these to be passed in to create_openmm_system via kwargs, i.e.

omm_system = forcefield.create_openmm_system(
    topology, charge_from_molecules=molecules,
    partial_bond_orders_from_molecules=molecules,
    toolkit_registry=toolkit_registry
)

There's some grey territory with respect to reproducibility when the definition of a force field conflicts with this usage, i.e. using partial charges from a file on disk vs. using AM1-BCC if prescribed by the force field.

The implementation of these methods has proven to be really tricky, as evidenced by several headaches and lines of code in the toolkit. Determining how charge methods override each other, determining how partial bond orders are calculated with one method in one handler should or should not affect interpolated parameters in another handler, all while considering whether or not to use the charges/bond orders/etc. specified in the topology.

In a perfect world, a simpler approach would be to completely de-couple these steps such that i.e. the partial bond orders are known before parameter assignment takes places and partial charges are resolved before the electrostatics handlers are created. A current implementation assumes that these data exist in the topology - not because it's best to read them from there, but because it's the simplest solution for now and allows this two-step process to be implemented in the future.

JSON serialization of pint.Quantity not behaving

Relevant for storing unit-tagged values

from pydantic import BaseModel, parse_obj_as


class PintClass(BaseModel):

    val: pint.Quantity

    class Config:
        arbitrary_types_allowed = True
        
pint_class = PintClass(val=1*u.dimensionless)
pint_class.dict()
# out: {'val': 1 <Unit('dimensionless')>}

assert pint_class == parse_obj_as(PintClass, PintClass(val=1*u.dimensionless).dict())

parse_obj_as(PintClass, PintClass(val=1*u.dimensionless).json())
# out: TypeError: Object of type 'Quantity' is not JSON serializable

The pint docs recommend using strings as an intermediate to serialization formats, although this is only safe if we trust it to be saved and read with the the same registry

build_smirnoff_collection is really slow

It takes roughly a minute to convert Parsley over, which is not acceptable as it's going to be the most common use case for a while. I suspect it's because the sympy checking is done on every term, but there are probably other performance gains to be had.

Other ideas may be to

move the expression logic into handler headers
not construct the entire collection at init time, but convert each parameter one-by-one as they're used in the system
parallelize? send each handler off to different threads as separate tasks

Dealing with unit-less representations

While we should have everything tagged with units when possible, there are likely to be cases in which stripping units is necessary, i.e. casting to a jax.numpy object:

>>> import jax.numpy as jnp
>>> import numpy as onp
>>> import pint
>>> u = pint.UnitRegistry()
>>> vals = onp.random.rand(4) * u.meter
>>> jnp.array(vals)
/Users/mwt/software/miniconda3/envs/openff-system/lib/python3.7/site-packages/jax/numpy/lax_numpy.py:2116: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  object = np.array(object, dtype=dtype, ndmin=ndmin)
/Users/mwt/software/miniconda3/envs/openff-system/lib/python3.7/site-packages/jax/lib/xla_bridge.py:125: UserWarning: No GPU/TPU found, falling back to CPU.
  warnings.warn('No GPU/TPU found, falling back to CPU.')
DeviceArray([0.776736  , 0.5026873 , 0.08218545, 0.91840476], dtype=float32)
>>> # Same thing happens by default with original NumPy
>>> onp.array(vals)
__main__:1: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
array([0.77673604, 0.50268727, 0.08218545, 0.91840475])

my initial concerns are

Ensuring going from a stripped external representation to a tagged internal representation reliably produces the same units
Possible necessity, and downsides, of a set of "default" units used in stripping units away
User experience when tagged representations aren't necessarily essential (what should be enforced?)

Interop: Gromacs ITP files

It might be nice to support writing out to ITP files, which could help against data explosion and the atom type compression problem.

https://manual.gromacs.org/documentation/current/reference-manual/topologies/topology-file-formats.html#molecule-itp-file

Units: Setting/constructing values with SimTK units

A minor oversight in #77 was not allowing the input of SimTK quantities (reproduction below). It should be feasible to convert this data inside of the validator as another elif block. There may be downsides associated with letting an extra data type creep in and being responsible for managing them, i.e. disallowing them altogether would shift that burden to the user.

In:

import numpy as np
from openforcefield.topology import Molecule
from simtk import unit as omm_unit

from openff.system.stubs import ForceField

top = Molecule.from_smiles("C").to_topology()

parsley = ForceField("openff-1.0.0.offxml")

out = parsley.create_openff_system(top)

out.box = [4, 4, 4] * np.eye(3) * omm_unit.nanometer

Out:

Traceback (most recent call last):
  File "repro.py", line 13, in <module>
    out.box = [4, 4, 4] * np.eye(3) * omm_unit.nanometer
  File "pydantic/main.py", line 394, in pydantic.main.BaseModel.__setattr__
pydantic.error_wrappers.ValidationError: 1 validation error for System
box
  Could not validate data of type <class 'simtk.unit.quantity.Quantity'> (type=value_error)

Where should positions go?

I'm going to beef this up and edit it down later.

We keep looping around this question, which seems innocent and straightforward at a high level - in #29 I pretty much just say "yeah, we need positions" - but I'm not satisfied with any of my current ideas for implementation. Coupled is the question of whether or not the toolkit's Topology class should include positions as a high-level API point. My current opinion is largely against it and I will attempt to elaborate later.

Existing approaches

Each atom knows its position at a core level. This the implementation used by GMSO.
Positions are stored elsewhere, but meant to be coupled. I think this is the more common approach, i.e. how engines typically store them in separate files.
I don't think there are any other notably different approaches? Lists look better when they have more than two items, though.

Existing pain points

Carrying around lots of data. A user loading in i.e. a PDB file to be thrown at a force field may find it frustrating that the positions aren't kept through the workflow and must be re-attached to the OpenMM simulation long after the file was loaded in. I bring this up not to criticize OpenMM's approach, but to point out that this is more or less the user experience that we should be comparing to; it would be better to have a more streamlined flow, worse to have a clunkier one, and acceptable to have one that's similar.
Keeping data in sync. If a topology is modified, i.e. an atom added, a separate array of numbers is no longer a valid description of the positions of those atoms. Should an error be thrown, should modifications to the topology require corresponding updates to the positions, or should sync only be enforced at particular points (i.e. serialization?)
Mismatch of useful data. Most of the time, MM only needs a basic description of the topology (atoms, bonds, maybe residues). Barring researching exploring new things, an MM view of a system doesn't need things like the aromaticity model, partial bond orders, or even stereochemistry.
Accessibility of data. Lookups, atom selection, something or other.

Core questions

How valid is a System object without position data?
Most or all of the important use cases require some sort of export to an energy evaluator, which requires position data. It's probably best, in general, to allow for the creation of an empty system (my_sys = System()) and ship an API that allows things to be built up piece-by-piece instead of requiring everything be done in a single, massive constructor call.
What are our key use cases, and for each, how can we expect position data to be brought to us, what are the guarantees we need to make to our users about it, and how should we build out interfaces to deal with these cases? I'll expand on this later.
(A question for the toolkit) do positions fit into a reasonable description of a chemical topology? I don't think so.

Options

Always assume that positions are stored in the topology.
If we got with something like a toolkit Topology object stored as a topology attribute, this offloads most of the work to the toolkit. We'd expect this to always keep things in sync, always have the right and enough data, and not allow positions to be set except for through the Topology API. There would be no System.positions attribute.
Store positions in a high-level positions attribute.
This would involve having a System.positions attribute that could take in data from a few different sources, but must stay in sync with the topology to some extent. Therefore, the responsibilities of processing, exporting, and maintaining
Do both 1. and 2.
This seems like a terrible idea as it would involve unnecessary duplication of data and likely be confusing for the user if it works at all.
Do 1., but have a System.positions attribute that contains pointers
This would mostly solve the problem of syncing data, but may be tricky to implement in a way that's sensible to the user. It also isn't much of a solution, as it still exports the responsibility (and control) of implementation to the toolkit.

Right now I like 2., as I think that positions should be kept separate from the other components of a system. This seems to be the most straightforward way of directly gobbling up an array ("well, I have this Nx3 list of coordinates from reading the PDB file, I'll just put it here") and exposing it to the user. This also keeps the door open for other programs to handle the positions, i.e. replacing conformers in a single-molecule system or replacing coordinates with re-packed coordinates (as calling PACKMOL or your favorite equivalent won't normally modify the topology).

Design: Structure of interoperability layer

(in progress)

Before setting out on its implementation, I'd like to consider how the code should be architected to be most healthy long-term.

Necessary features:

Built off of switchable backends (including our own)
Gracefully fails when trying to export with insufficient data
Accurate and consistent with other liibraries to some high threshold
Reasonably maintainable

Some features that would be nice to have:

Minimal effort associated with adding a new backend
Logging which backend(s) were used
Encoded hierarchies of preferred backends for particular formats
User control over using unsafe or untrustworthy writers

I'm currently struggling with the decision of whether this should be implementing with a registry pattern, similar to how the OpenFF Toolkit wraps cheminformatics toolkits, or something like the factory design pattern (or if they are in fact identical).

Host-guest system setup example

From an already-existing complex, potentially with charges provided for the receptor but not the ligand.

openforcefield / openff-interchange Goto Github PK

openff-interchange's Introduction

OpenFF Interchange

Documentation

How to cite

Installation

Getting started

Copyright

License

Acknowledgements

openff-interchange's People

Contributors

Stargazers

Watchers

Forkers

openff-interchange's Issues

Example:

Related:

Existing approaches

Existing pain points

Core questions

Options

Recommend Projects

Recommend Topics

Recommend Org