mosdef-hub / gmso Goto Github PK

View Code? Open in Web Editor NEW

51.0 51.0 32.0 13.32 MB

Flexible storage of chemical topology for molecular simulation

Home Page: https://gmso.mosdef.org

License: MIT License

Python 98.11% Dockerfile 0.11% Shell 0.02% TeX 1.75%

gmso's People

Contributors

Stargazers

Watchers

gmso's Issues

Site selection

A major feature we want to have robust and wide support for is site (atom) selection. What comes to mind at the moment includes:

What else do we want to be able to do?

Gromacs Writer Triclinic cell support

Gromacs supports all triclinic boxes, assuming they adhere to the following inequality:

All triclinic boxes can be translated/rotated in such a way to satisfy this inequality.

Will need a method to take the topology box and make sure the system satisfies the above inequalities.

Serializing Topologies

Something to think about down the line. May probably require changing some data types and classes.

Mixing Rules and CG systems

Currently, our parmed backend only can handle two types of mixing rules, which is due to parmed.

However, with some CG simulations, mixing rules are not used at all. They specify the pairwise non-bonded interactions for each pair of types explicitly.

We should add support to handle this.

`Compound` name using `from_mbuild`

When initializing a Topology using from_mbuild, Topology.name = Compound.

Not sure how this should be changed, but the name should be something other than Compound to avoid confusion.

SymPy Gotchas

There are some nuances of SymPy that we just need to keep an eye on. Most of these I dont think will be an issue, but there are some that we might run into.

For example, if we have some scaling factor thrown in that is just a fraction, python will evaluate the numbers before SymPy will. The solution is to just sympify one or both of the numbers beforehand.

Will probably just need to document this a bit when the time comes.

Will also be useful to create a few of the potential forms already and have them as ones that can be used from the start (LJ, Mie, etc.)

Storing box vectors

After merging #1 and #20

====================================================== test session starts ======================================================
platform darwin -- Python 3.6.8, pytest-4.1.1, py-1.5.3, pluggy-0.7.1 -- /anaconda3/envs/mosdef/bin/python
cachedir: .pytest_cache
rootdir: /Users/mwt/software/topology, inifile:
plugins: cov-2.6.0
collected 23 items / 22 deselected                                                                                              
run-last-failure: rerun previous 1 failure

topology/tests/test_box.py::TestBox::test_vectors FAILED                                                                  [100%]

=========================================================== FAILURES ============================================================
_____________________________________________________ TestBox.test_vectors ______________________________________________________

self = <topology.tests.test_box.TestBox object at 0xa21d6e748>

    def test_vectors(self):
        box = Box(lengths=np.ones(3), angles=[40.0, 50.0, 60.0])
        vectors = box.full_vectors_from_angles()
        test_vectors = np.array([[1, 0, 0],
                                [0.5, 0.86603, 0],
                                [0.64278, 0.51344, 0.56852]])
>       assert np.isclose(box.vectors(), test_vectors)
E       AttributeError: 'Box' object has no attribute 'vectors'

topology/tests/test_box.py:49: AttributeError
======================================================= warnings summary ========================================================
/Users/mwt/software/plyplus/plyplus/plyplus.py:758
  /Users/mwt/software/plyplus/plyplus/plyplus.py:758: DeprecationWarning: invalid escape sequence '\.'
    return codecs.getdecoder('unicode_escape')(token_value)[0]
  /Users/mwt/software/plyplus/plyplus/plyplus.py:758: DeprecationWarning: invalid escape sequence '\d'
    return codecs.getdecoder('unicode_escape')(token_value)[0]
  /Users/mwt/software/plyplus/plyplus/plyplus.py:758: DeprecationWarning: invalid escape sequence '\('
    return codecs.getdecoder('unicode_escape')(token_value)[0]
  /Users/mwt/software/plyplus/plyplus/plyplus.py:758: DeprecationWarning: invalid escape sequence '\)'
    return codecs.getdecoder('unicode_escape')(token_value)[0]
  /Users/mwt/software/plyplus/plyplus/plyplus.py:758: DeprecationWarning: invalid escape sequence '\+'
    return codecs.getdecoder('unicode_escape')(token_value)[0]
  /Users/mwt/software/plyplus/plyplus/plyplus.py:758: DeprecationWarning: invalid escape sequence '\*'
    return codecs.getdecoder('unicode_escape')(token_value)[0]

-- Docs: https://docs.pytest.org/en/latest/warnings.html
====================================== 1 failed, 22 deselected, 6 warnings in 2.20 seconds ======================================

This can be fixed easily enough (assert np.isclose(vectors, test_vectors)) so I don't consider this a problem. But are we planning to include box.vectors as an attribute of the box class?

Supported Python versions

Python 2.7: I think it's worth putting a small amount of work into at least seeing if this can work. Our dependencies all still supported this and I don't think our code has anything that's Python 3 exclusive. Dropped by unyt as of 2.0.0.
~~Python 3.3:~~ This and older versions of Python 3 are not worth considering.
~~Python 3.4~~: ~~I don't have a strong feeling about but it should (:tm:) work.~~ Could work if desired but dropped by unyt as of 2.0.0.
Python 3.5: Getting old but I and others are still using it on a daily basis. I think we should support it for a while.
Python 3.6: Should be supported without a doubt.
Python 3.7: The ecosystem should be stable enough to include this without major issues.
Python 3.8: Currently in alpha; not worth worrying about at the moment but it's on the horizon.

handling addition of topologies

I just tried adding water and ion parmed structures together and writing out to a PDB file for use with OpenMM. It seems that when I did so, the water molecules were written out to the pdb as a single molecule. As a result, OpenMM wasn’t able to map any of these waters to their respective atom types.

There might be a way to handle this through Parmed, but I wanted to just document this as we might be able to handle this in a better way when we add two topologies together.

Methods to check validity against supported engines

At a simple level, we could have a function top.compatable_engines()/top.compatable_formats() that calls various writers' validation functions to see if a topology could be written to an engine. Maybe at different levels of yes/no/maybe

A build-off of this could be some user warnings while changing things, for example if an AtomType.nb_function is changed from something that everything supports (i.e. Lennard-Jones) to something that some support (i.e. Mie) to something that nobody supports (a typo potential).

Possible grammar bug

Anybody seen this?

sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/Users/mwt/software/plyplus/plyplus/grammars/python.g' mode='r' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/Users/mwt/software/plyplus/plyplus/grammars/config.g' mode='r' encoding='UTF-8'>
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/Users/mwt/software/plyplus/plyplus/grammars/selector.g' mode='r' encoding='UTF-8'>

I'm not sure how exactly to reproduce it but it comes up sometimes after I add import unyt as u to files. I have some half-baked plyplus things going on in this conda environment but they shouldn't, I don't think, affect this package.

Potential term support

Different engine formats and external packages have slightly different functions, which will affect the resultant parameters for I/O:

Bond potentials

engine/package	functional form	comments
internal	V = (1/2) * k * (r-r_eq)**2	Default form in `BondType`
gromacs	V = (1/2) * k * (r-r0)**2	gmx bondtype 1
lammps	V = k * (r-r0)**2	bond_style harmonic
hoomd	V = (1/2) * k * (r-r0)**2	hoomd.md.bond.harmonic
parmed	V = k * (r-r0)**2	This code
openmm xml	V = (1/2) * k * (r-r0)**2	OpenMM 7.0 Theory guide

Angle potentials

Internally, k in units of kj/deg**2 and theta_eq in units of deg

engine/package	functional form	comments
internal	V = (1/2) * k * (theta-theta_eq)**2	Default form in `AngleType`
gromacs	V = (1/2) * k * (theta -theta0)**2	gmx angletype 1
parmed	V = k * (theta-theta0)**2	This code

Dihedral potentials

Potential Templates

name	functional form	parameters	comments
LennardJonesPotential	V = 4 * epsilon * ((sigma/r)12 - (sigma/r)6	epsilon, sigma
MiePotential	V = (n/(n-m)) * (n/m)*(m/(n-m)) epsilon * ((sigma/r)n - (sigma/r)m)	n, m, epsilon, sigma
BuckinghamPotential	V = a * exp(-b * r) - c * r**-6	a, b, c
HarmonicBondPotential	V = 0.5 * k * (r-r_eq)**2	k, r_eq
HarmonicAnglePotential	V = 0.5 * k * (theta - theta_eq)**2	k, theta_eq
HarmonicTorsionPotential	V = 0.5 * k * (phi - phi_eq)**2	k, phi_eq	phi_cis = 0
PeriodicTorsionPotential	V = k * (1 + cos(n * phi - phi_eq))**2	k, n, phi_eq	phi_cis = 0
OPLSTorsionPotential	V = k0 + 0.5 * k1 * (1 + cos(phi)) + 0.5 * k2 * (1 - cos(2 * phi)) + 0.5 * k3 * (1 + cos(3 * phi)) + 0.5 * k4 * (1 - cos(4 * phi))	k0, k1, k2, k3, k4	phi_cis = 0
RyckaertBellemansTorsionPotential	V = c0 * cos(phi)*0 + c1 cos(phi)*1 + c2 cos(phi)*2 + c3 cos(phi)*3 + c4 cos(phi)*4 + c5 cos(phi)**5	c0, c1, c2, c3, c4, c5	phi_trans = 0

Method to check for total charge of system (charge of subtopology as well)

Would be useful to check the charge of the topology or subtopologies.

Abstract classes for readers and writers

It might be useful to have some simple abstract classes to better define the structure of our readers/writers. For now I'm thinking something like:

from abc import ABC, abstractmethod

class Writer(ABC):
    def __init__(self):
        super(Writer, self).__init__()

    @abstractmethod
    def write(self, filename):
        pass

And then for like the GRO writer we would have:

class GromacsGroWriter(Writer):
    def write(self, filename):
        # Insert logic here for writing the GRO file

At the moment, there doesn't seem to be a ton of structure we would need to define in the base class, but I feel like this could be useful as:

It would make our reader/writer code a bit more organized
As the codebase evolves, we may decide on additional attributes/methods that the readers/writers should share
Related to (1), but having a better-defined structure for our readers/writers should make it easier for other programmers to contribute readers/writers

numpy comparisons with unyt

While fixing #21, I encountered this. I'm not sure if unyt is meant to play this well with numpy but it may be worth doing this with our own testing function - just comparing values doesn't necessarily mean they're values in the same units

(Pdb) vectors
unyt_array([[1.        , 0.        , 0.        ],
            [0.5       , 0.8660254 , 0.        ],
            [0.64278761, 0.51343833, 0.56851136]], 'nm')
(Pdb) test_vectors
unyt_array([[1.     , 0.     , 0.     ],
            [0.5    , 0.86603, 0.     ],
            [0.64278, 0.51344, 0.56852]], 'nm')
(Pdb) np.allclose(test_vectors, vectors, atol=1e-3)
*** unyt.exceptions.UnitOperationError: The <ufunc 'add'> operator for unyt_arrays with units "dimensionless" (dimensions "1") and "nm" (dimensions "(length)") is not well defined.
(Pdb) np.allclose(test_vectors.value, vectors.value, atol=1e-3)
True

Indexing bug in `from_mbuild`

When initializing a Topology from mBuild using from_mbuild, I noticed that the position array is different from when Topology is initialized without the use of mBuild.

For example:

topology = from_mbuild(system) and topology.site_list[0].position[0] outputs: unyt_array([2.0470509, 1.8087942, 0.4239625], 'nm')

while

topology = Topology() and topology.site_list[0].position[0] outputs:
unyt_quantity(1., 'nm')

Class equivalences

We should try to override __eq__ for each of the core classes

Have a `name` attr with our potential terms

something that might be a nice feature in the future is to have a name attr with our potential terms. So this could have a name 12-6 LJ instead of having to manually put it in here.

Originally posted by @justinGilmer in #68

Custom Exceptions

We'll probably come across special cases we'll want our own errors and warnings for. mbuild has a simple one but foyer has several specific ones associated with various validation issues

https://github.com/mosdef-hub/foyer/blob/dfc09e9c33705933c74010ce8a8c050081df8942/foyer/exceptions.py

Coordinate Transforms

One operation that is pretty common for most systems are coordinate transformations. Currently, we do not support any operations like translations, rotations, or reflections.

Writing tabulated potentials from expressions

"LAMMPS-style" box vectors

Not sure what they're called, but I think this block of code in HOOMD world is setup the same way LAMMPS does (about 2/3 of the way down this page https://lammps.sandia.gov/doc/Howto_triclinic.html). Maybe the box class should build these, analagous to how it builds vectors?

Need setters and getters for Topology.name

We agreed that for as many class attributes as we can, we should have @property and @setter methods for each attribute. Topology.name currently does not have that support.

Use of graphs

Have one graph for actual bond connections - could be useful for looking at bonded neighbors X-edges away.

Have another graph for the hierarchy - could be useful for finding which sites belong to which topologies/subtopologies

Only library I can currently think of is networkx, will maybe have to consider memory management/speed. Also will have to consider directionality in bond graphs or hierarchy graphs

GOMC support

Needs:

Be able to read .par files (Parmed does this, but doesn't support some of the functionals that GOMC/NAMD need to use)
.par files limit atomtypes to only 4 characters, so this could potentially cause issues for writer
On the bright side, I think Parmed can read .par files, create the ParameterSet, and then output the OMM FFXML
GOMC workflows have to involve manually specifying atomtypes in their topology files. They use a lot of CHARMM atomtypes, so I think there would be a need to add SMARTS definitions for the atomtypes, or just use the hackyCG method that I use for specifying charmm atomtypes
Current GOMC work flows do take 'prototype' molecules and call packmol already, but use VMD to specify parameters
The need to accommodate Mie nonbonded potentials, and making sure we can read this in a FFXML. We might not need to parse the tag like "HarmonicBondForce" or "CustomBondForce", but rather just needing to parse the energy expression via sympy

For reference:
GOMC documentation

Update topology against parts of other objects

Stemming from a discussion @rmatsum836 and I just had -

The current convert/ module is designed to make a new object from another one, i.e. making a new topology from a mb.Compound that already exists. What if instead of making a completely new object, we allowed part of one to be stored into the other? Like update_coordinates in mbuild but between two objects instead of from disk.

I have a topology with the metadata/hierarchy/forcefield/etc I want but I had to initialize everything at the origin, I want to use mb.fill_box to get a reasonable starting configuration.
I have a Compound that has the hierarchy I want, but for some reason all the other information of my system is contained in a topology.
I have a pmd.Structure that includes everything I want, but the non-LJ potential I am forced to use is only supported by topology.

In each case, making a completely new object either would lose some important information, and instead we'd just want to take a part of one and put it into the other. Seem doable piece-by-piece, or crazy?

Unit systems

There is some interest in having an internal "unit system" concept, similar to how LAMMPS handles units. Proper handling of units (#4) will make this trivial when writing/reading to different file formats and converting to external structures, but it is reasonable to expect users to query values while tinkering with systems. (For example, in looking at topology.positions or some force constant in an angle potential, some people may prefer some units over others.) I almost only see two unit systems (nm/kJ and A/kcal) used, but in the future we may want to entertain other systems. For example, eV units in metal systems or maybe even dimensionless values in LJ systems.

We should start using only nm and kJ and at some point in the future think about being able to switch between them.

File format support

I don't think GitHub has the concept of a pinned issue, but I want to have a master tracker of the status of support for various file formats.

Extension	Engine	Typed or Un-typed?	Internal reader	Internal writer
.xyz	many	Un-typed	✅ `8bc9360`	✅ #46
.mol2	many	Either
.gro	GROMACS	Un-typed	✅#26	✅ #48
.top	GROMACS	Typed
.pdb	many	Un-typed
.data	LAMMPS	Typed		✅ #28
.gsd	HOOMD	Typed		✅ #47
?	Cassandra	Both?
.par	GOMC/NAMD? #78	Typed
.psf	GOMC/NAMD #78	Un-typed
.zmat	ilff/fftool	Un-typed

Custom/non-element "Elements"

For non-atomistic systems, the ParmEd/MDTraj approach is to defined a custom "Element" that is an instance of some Element class but does not correspond to anything physically - instead it just tracks a name and a mass (which is all we think we need to care about).

My idea was to have a special Element-like class that inherits from our Element class but only includes the information we care about.

Alex suggested that, because physically a UA/CG bead doesn't really include the physical concept of an element, maybe we shouldn't have them as elements? A Site can include the mass of the bead and other things can probably just be tied to that Site (or, in some situations, instead of AtomType). I like this approach - it makes sense and is consistent with the "be able to specify things but also fall back if needed" sort of design we use elsewhere.

Renaming `site`

https://docs.python.org/3/library/site.html

Looks like we named a class something that's already taken in the standard python library, so we'll need to rename site to something else - at least site.py will have to change, maybe the class can stay.

It's strange that we haven't broken anything yet.

Doing the calculations twice for tilt factors in lammpsdata.py

Will raise a PR soon, but the code here should not be needed anymore. Box handles all of those calculations. We appear to overwrite those values as well.

Handling molar quantities with unyt

I think this should work fine, but it makes me nervous so I'm raising an issue in case we need to keep track of it. See the original conversation:

>>> my_mass = 5 * u.Unit('g/mol')
>>> my_mass
unyt_quantity(5., 'g/mol')
>>> my_volume = 100 * u.Unit('nm**3')
>>> my_density = my_mass/my_volume
>>> my_density
unyt_quantity(0.05, 'g/(mol*nm**3)')
>>> my_density.in_units(u.Unit('g/(nm**3)'))
unyt_quantity(8.30269461e-26, 'g/nm**3')
>>> my_density/u.physical_constants.Na
unyt_quantity(8.3026952e-26, 'g/nm**3')

It looks like unyt can recognize 'per mol' quantities and scale by Avogadro's number appropriately

Originally posted by @ahy3nz in https://github.com/mattwthompson/topology/pull/47/files

Renaming "Topology"?

Posting random ideas if we choose not to stick with Topology and subtopology

Model/SubModel - in theory, this data structure includes all the information for how you want to model your system. Although dubbing our data structure a "Model" might sound kind of pretentious
MolecularModel/Model
Modelcule/Submodelcule - Less pretentious, but harder to take seriously
System/SubSystem - this is intermol

Code cleanliness standards

I don't want these to be rigid requirements, but we should strive to have some basics consistently

Docstrings

I like numpy style but there are others
Full docstrings for classes and public functions
One-liners for private functions

Classes

have a __repr__

I will edit this according to recommendations - obviously this is just a starting point

Code coverage

I like codecov because it comes with a bot that does coverage diffs on PRs. But until we make this public and have CI running, it's not worth more than a periodic check. We're at least off to a good start:

(master)$ coverage report -m
Name                                    Stmts   Miss  Cover   Missing
---------------------------------------------------------------------
topology/__init__.py                        2      0   100%
topology/core/__init__.py                   0      0   100%
topology/core/atom_type.py                  3      0   100%
topology/core/box.py                       44      3    93%   9, 88, 102
topology/core/connection.py                11      0   100%
topology/core/element.py                    8      0   100%
topology/core/site.py                      18      2    89%   13, 15
topology/core/topology.py                  40     11    72%   18, 30-33, 58-64
topology/external/__init__.py               0      0   100%
topology/external/convert_mbuild.py        34      1    97%   33
topology/external/convert_parmed.py        19      0   100%
topology/formats/__init__.py                0      0   100%
topology/formats/xyz.py                    23      4    83%   16-19, 28-31
topology/tests/__init__.py                  0      0   100%
topology/tests/test_atom_type.py            4      0   100%
topology/tests/test_box.py                 33      0   100%
topology/tests/test_connection.py          10      0   100%
topology/tests/test_convert_mbuild.py      14      0   100%
topology/tests/test_convert_parmed.py       8      0   100%
topology/tests/test_element.py              7      0   100%
topology/tests/test_site.py                 4      0   100%
topology/tests/test_topology.py            21      0   100%
topology/tests/test_xyz.py                  7      0   100%
topology/utils/__init__.py                  0      0   100%
topology/utils/io.py                        7      1    86%   24
---------------------------------------------------------------------
TOTAL                                     317     22    93%

Might be worth checking once a week or so.

dtype consistency

We should try to have consistent data types, i.e. not mixing np.float64 and np.float32. This applies to core functions, external conversions, and formats.

Also need to consider how unyt subclasses arrays, it looks like they try very hard to keep things np.float64:

>>> import numpy as np
>>> import unyt as u
>>> y = np.ones(4, dtype=np.float32)
>>> myvar = u.nm * y
>>> y.dtype
dtype('float32')
>>> myvar.dtype
dtype('float64')

2-D simulation cells

>>> from topology.core.box import Box
>>> Box([2, 2, 0])
Box(a=2.0, b=2.0, c=0.0, alpha=90.0, beta=90.0, gamma=90.0)

In the validation of box lengths, we currently allow for values of zero. For now we should prohibit that, I think, but

Is there any reason to support 2-D systems (or any other than 3-D)? I have not personally done this but LAMMPS has some support and I assume HOOMD does as well.
If we do, would it be better to store a c=0 or accomodate a box with no c (lengths, etc. with only 2 values)?

ethane.gro and ethane.top in utils do not agree

Ethane.top is fine (C2H6), but ethane.gro is (C3H5) with overlapping coordinates in index 2 and 5 (starting from index 1).

Support tabulated potentials?

It would be killer to support this with the sympy stuff:
https://hoomd-blue.readthedocs.io/en/stable/module-md-pair.html#hoomd.md.pair.table

Not sure if it is feasible or not, maybe if the functional form doesn't exist, it could warn then use tabulated? If hoomd supported tabulated bonds and angles, then you could do pretty arbitrary forms

Adding mixing rules and other LJ properties

For the gsd writer #47 , one next step is to write out all FF parameters to an external python module that can be called from the actual hoomd run script. For example:

force_field.py

import hoomd
default_lj_coeffs = {'alpha': 1.0, 'r_on': 12, 'r_cut': 12}
lj_coeffs = hoomd.md.pair.coeff()
lj_coeffs.default_coeff = default_lj_coeffs
lj_coeffs.set('A', 'A', epsilon=10, sigma=5.0)
  
harmonic_bond_coeffs = hoomd.md.bond.coeff()
harmonic_bond_coeffs.set('A-A', r0=4, k=100)

run.py:

import hoomd
import hoomd.md

sim1 = hoomd.context.initialize("")
with sim1:
    import force_field
    system = hoomd.init.read_gsd('bonded.gsd')
    nl = hoomd.md.nlist.cell()
    nl.reset_exclusions(['1-3', '1-2'])
    lj_potential = hoomd.md.pair.lj(12, nl)
    lj_potential.pair_coeff= force_field.lj_coeffs
    lj_potential.pair_coeff.default_coeff = force_field.lj_coeffs.default_coeff

    harmonic_bonds = hoomd.md.bond.harmonic()
    harmonic_bonds.bond_coeff = force_field.harmonic_bond_coeffs
    harmonic_bonds.bond_coeff.default_coeff= force_field.harmonic_bond_coeffs.default_coeff
    hoomd.md.integrate.mode_standard(dt=0.01)
    all = hoomd.group.all()
    integrator = hoomd.md.integrate.nvt(all, 2.5, 3)
    hoomd.dump.gsd('traj.gsd', 10, all, overwrite=True)
    hoomd.run(1000)

To move further and write force_field.py from a Topology, we will need to account for these hoomd specifications (or smartly infer/guess):

alpha
r_on
r_cut
mixing rules for sigma and epsilon
Other specifications, like the atom types or bond types are already accounted for in Topology

Handling `Site` Indices

As mentioned in #28, I think it's worth having a discussion about how to handle Site indices.

It's easy enough to gather Site indices through a list comprehension such as : [(i,j) for i,j in enumerate(topology.site_list)] but it's not immediately clear how these indices can be translated to other attributes like Site.connection_list.

External support

Keeping track of various classes in other packages we want to convert to and/or from

Data structure	Package	Typed or Un-typed?	Converting from	Converting to
Compound	mBuild	Un-typed	Support	Support
Structure	ParmEd	Either	Partial support	Partial support
Topology	MDTraj	Either
Topology/Modeller/System	OpenMM	Either?	#371	#99, #371
networkx	Graph	Either?
Open Babel	Molecule/others	Un-typed

Units

At the moment, we're all in agreement that we should have internal values tagged with some sort of a unit. There are probably a few ways to do this

External packages

There are many out there, I have only used simtk.unit that is somewhat entangled with OpenMM. We probably want to avoid this if it alone would force us to depend on OpenMM.

Internal

MDTraj has some internal code, though untouched for years and maybe not used much in the package itself. Doing this ourself would seem to more or less involve re-inventing the wheel. I guess the benefit would be not depending on an upstream package. I don't think this is worth doing.

Method to take box vectors as input and generate a Box class

To keep the Box class simple, we support only taking in [a,b,c][alpha, beta, gamma] as inputs.

#1 we agreed that it would make the most sense to just define methods as we needed them to read in other box definition formats. From #26 we need the support of reading in GRO file box formats and converting them to the inputs for our Box

Custom Exceptions

We'll probably come across special cases we'll want our own errors and warnings for. mbuild has a simple one but foyer has several specific ones associated with various validation issues

Definitions of supported functional forms for simulation engines

Disclaimer: I'm not 100% caught up on all the activity and discussion here and am still not very familiar with sympy, so feel free to close if this suggestion doesn't make sense.

It might be useful to keep some sort of library where we have files for each simulation engine that define the various supported functional forms for each interaction type. I'm thinking a file structure that's something like (with better names):

supported_functional_forms/
|-- gromacs/
| |-- nonbonded.txt
| |-- bond.txt
| |-- angle.txt
...

And then, e.g., nonbonded.txt would list the supported nonbonded functional forms:

4*epsilon*((sigma/r)**12 - (sigma/r)**6)
...

This way, if we have a topology with our various functional forms defined for each interaction, we can have a check that interfaces with the library when attempting to save to a certain format as to whether or not the engine supports the defined functional forms. Even better, we could have a property on the Topology that's something like supported_formats that could be queried to see which writers are available based on the defined functional forms for that topology.

There would likely be some snags with functional forms that are technically equivalent, but defined in slightly-different ways. This is where my lack of knowledge of sympy comes in, as I'm not sure how easy that would be to handle. It looks like a similar discussion is ongoing in #31 with regard to harmonic potentials with and without the 0.5 scaling.

xlo/xhi convention in LAMMPs (and perhaps GSD)

See below discussion in the LAMMPS PR:

We all need to have a conversation about how to define xlo, ylo, zlo at some point. It should not default to 0,0,0 i think. Either the origin could be the minimum x,y,z of all the sites, or the site that is the most negative in all three dimensions. Since lammps does not care about the origin location iirc.

Also, a bit of a cleaner implementation can be found in the conversation in #61

Originally posted by @justinGilmer in #28

Differentiating between parameters and input variables in sympy expressions

MWE

from topology.core.atom_type import AtomType
atom_type = AtomType()

atom_type.set_nb_function(function='4*epsilon*((sigma/r)**12 - (sigma/r)**6)', parameters={'sigma': 1, 'epsilon': 1})

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-94-d874b03461a8> in <module>()
----> 1 atom_type.set_nb_function(parameters={'sigma': 1, 'epsilon': 1})

~/software/topology/topology/core/atom_type.py in set_nb_function(self, function, parameters)
    118             self._parameters.update(parameters)
    119 
--> 120         self._validate_function_parameters()
    121 
    122     def _validate_function_parameters(self):

~/software/topology/topology/core/atom_type.py in _validate_function_parameters(self)
    140                              " symbols do not agree,"
    141                              " extraneous symbols:"
--> 142                              " {}".format(extra_syms))
    143 
    144     def __eq__(self, other):

ValueError: NB function and parameter symbols do not agree, extraneous symbols: {r}

We need to distinguish between the parameters (sigma, epsilon) and the other variables ('r' here, but 'theta', etc. elsewhere)

Use logging

https://docs.python.org/3/library/logging.html
Could be useful to specify debugging-level warnings/errors systematically across this package

Storing, tracking, and printing metadata

Different engines have different treatment of some high-level settings (see #65), so we should try and keep track of

If/where they are stored
If it is the scope of this package to care about it

Engine	Mixing rule	1-4 LJ	1-4 electrostatics	Pair potential cutoff	Exclusions
LAMMPS	input file; maybe	input file; maybe	input file; ?	input file; maybe	input file; maybe
GROMACS	TOP; yes	TOP; yes	TOP; yes	MDP; no	TOP; yes
HOOMD	N/A, would have to individually scale 1-4 pairs; maybe	N/A; maybe	N/A; maybe	input file/run script; maybe	input file/run script; maybe

In the case of LAMMPS, most of these are implicitly or explicitly stored in the input file, which doesn't make sense for us to write out completely. But we should at least try to inform the user of the relevant information, maybe including printing out snippets of what should be in the input file.