This is where we define the source of the parameters in the file. A few things to note here...if we are making a forcefield from a specific paper, we can mark it as the primary="True" where it would presumably be the most recent and most specific. If we were combining several sources to make a more robust forcefield, we can have multiple be set to primary="True".
We have an author tag here that corresponds to who actually made the forcefield (or if we are referencing a paper, who were the authors on the paper). Since a forcefield might not necessary have a journal article, it seemed best to keep this separate from the Journal tag...
If there is a journal article associated with the parameters, the basic info can be provided in the Journal tag. Note, this has a doi tag in it as well, since the could be referencing a repository with a zenodo doi (not necessarily a journal article). Note, I didn't put an author parameter list in here. We could easily include that using the bibtex format scheme or something, but I feel like it would probably be the same as the authors we list using the tag.
The Note tag is to give any specific information that might be relevant (especially if we only use 1 or 2 parameters from a source, or we are, e.g., using parameters for a different angle "CT-CT-F angles are assumed to be the same as CT-CT-OH in this manuscript"
If parameters come from a personal communication, we can note that here, and define the year and authors above.
Presumably this could be generated automatically, but it would at least list what molecules were defined as tests.
I put together an example for the PFA forcefield. I think this will help us keep better track of not just source, but the relevance of those sources, and make it easier to parse this information. Again, we can use this xml file to generate the README.
<?xml version= "1.0"?>
<Foyer title="OPLS-AA parameters for perfluoroalkanes in Foyer format" website="https://github.com/chrisiacovella/oplsaa_perfluoroalkanes" family="OPLS-AA">
<Creator last="Iacovella" first="C.R."/>
<Source doi="10.1021/jp004071w" desc="All-atom OPLS parameters for perfluoralkanes" primary="True" year="2001">
<Author last="Watkins" first="Edward K"/>
<Author last="Jorgensen" first="William L"/>
<Journal name="Journal of Computational Chemistry" volume="105" number="16" pages="4118--4125" year="2001" title="Perfluoroalkanes: Conformational analysis and liquid-state properties from ab initio and Monte Carlo calculations" doi="10.1021/jp004071w" />
<Note>The forcefield here describes the general parameters for perfluoroalkanes; specific dihedrals exist for 4 and 5-mers in the original manuscript </Note>
</Source>
<Source doi="10.1002/jcc.540130806" desc="CT-F Bond Source" primary="" year="1992">
<Author last="Gough" first="Craig A"/>
<Author last="Debolt" first="Stephen E"/>
<Author last="Kollman" first="Peter A"/>
<Journal name="Journal of Computational Chemistry" volume="13" number="8" pages="963--970" year="1992" title="Derivation of fluorine and hydrogen atom parameters using liquid simulations" doi="10.1002/jcc.540130806" />
<Note> CT-F bonds are taken from parameters in this manuscript, as described in Watkins and Jorgensen. </Note>
</Source>
<Source doi="10.1021/ja00124a002" desc="CT-F Bonds" primary="" year="1995">
<Author last="Cornell" first="Wendy D"/>
<Author last="Cieplak" first="Piotr"/>
<Author last="Bayly" first="Christopher I"/>
<Author last="Gould" first="Ian R"/>
<Author last="Merz" first="Kenneth M"/>
<Author last="Ferguson" first="David M"/>
<Author last="Spellmeyer" first="David C"/>
<Author last="Fox" first="Thomas"/>
<Author last="Caldwell" first="James W"/>
<Author last="Kollman" first="Peter A"/>
<Journal name="Journal of the American Chemical Society" volume="117" number="19" pages="5179--5197" year="1995" title="A second generation force field for the simulation of proteins, nucleic acids, and organic molecules" doi="10.1021/ja00124a002" />
<Note> F-CT-F angles come from this manuscript, as described in Watkins and Jorgensen. </Note>
<Note> CT-CT-F angles are the same as CT-CT-OH and CT-CT-OS list in this manuscript, as described in Watkins and Jorgensen. </Note>
</Source>
<Source doi="10.1021/ja9621760" desc="All-atom OPLS parameters for alkanes" primary="" year="1996">
<Author last="Jorgensen" first="William L"/>
<Author last="Maxwell" first="David S"/>
<Author last="Torado-Rives" first="Julian"/>
<Journal name="Journal of the American Chemical Society" volume="118" number="45" pages="11225--11236" year="1996" title="Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids" doi="10.1021/ja9621760" />
<Note> Bonds and angles for the CT-CT and CT-CT-CT are taken from this manuscript for alkanes, as described in Watkins and Jorgensen. </Note>
</Source>
<AdditionalNote> The backbone dihedral specifically references opls_962 (i.e. C-CF2-C) rather than only using the "CT" class; if only the "CT" class were used, this would create a conflict with alkane systems if the parameters were merged. </AdditionalNote>
<AdditionalNote> The original parameters are defined as kcal/mol, this file uses kJ/mol; a conversion factor of 4.184 was used, consistent with OpenMM. </AdditionalNote>
<AdditionalNote> PI is defined as 3.141592653589 for conversion to radians, consistent with OpenMM.</AdditionalNote>
<AdditionalNote> Atom type names, e.g., opls_961, correspond to those defined in the OPLS forcefield itp file distributed with GROMACS. </AdditionalNote>
<AdditionalNote> Conversion from OPLS-style dihedrals to RB follow the formulas detailed in the GROMACS manual. </AdditionalNote>
<TestSuite>
<Molecule name="CF4.mol2" status="PASS"/>
<Molecule name="perfluoro-2-methylbutane.mol2" status="PASS"/>
<Molecule name="perfluorohexane.mol2" status="PASS"/>
</TestSuite>
</Foyer>
It might be good to have the section on the test suite automatically generated when running the atomtyping.py test (the script to create the readme could run the test suite too). E.g., it would update which molecules are in the tests and if the were atom-typed correctly.
In offline discussions, I think I'm going to try creating a minimal documentation file (e.g., that only requires doi and notes, rather than full references). The parsing code will automatically gather this info and write out both the Readme.md file and a "full" xml file.
We should also write out a bibtex file with the references included in the xml file.
Well I guess I don't want the final readme or xml file to only have the DOIs; I can quickly see an author and a year and know what paper it is, but I'd have to do more work to actually lookup the DOI. Glancing at the specs in the Readme for that other file, we might have some additional stuff automatically written to a readme:
Some boilerplate about foyer and file format
scan the actual forcefield file and list the functional forms used for bonded/nonbonded parameters
could have some additional boiler plate information that is grab that defines how the format should actually be, like order of atom_ids in an angle.
Count the number of atom types/bonds/angles/dihedrals in the file (it seems like it would be useful to understand how expansive the force field file is
we could probably have a separate page generated (linked from the readme) that just lists the atom types, their description and their SMARTS string to make it a little more user friendly to see what is in the document.
While we could certainly get DOIs directly from the forcefield xml file, I think a separate xml document would be good since I think it is essential we add some notes associated with each paper, considering most forcefield parameter sets have been derived/aggregated in not so standard ways. Also it allows a clear explanation as to which parameters were chosen when there are duplicates (e.g., when merging two force field files).
In any case, working on updating the parsing code to automatically grab info from a doi, and then populate the relevant fields.
from forcefield_template.