molssi / qcengine Goto Github PK

View Code? Open in Web Editor NEW

161.0 15.0 79.0 10.7 MB

Quantum chemistry program executor and IO standardizer (QCSchema).

Home Page: https://molssi.github.io/QCEngine/

License: BSD 3-Clause "New" or "Revised" License

Python 99.86% Shell 0.08% Makefile 0.05%

quantum-chemistry python3 computational-chemistry standards chemistry

qcengine's Introduction

QCEngine

Quantum chemistry program executor and IO standardizer (QCSchema) for quantum chemistry.

Example

A simple example of QCEngine's capabilities is as follows:

>>> import qcengine as qcng
>>> import qcelemental as qcel

>>> mol = qcel.models.Molecule.from_data("""
O  0.0  0.000  -0.129
H  0.0 -1.494  1.027
H  0.0  1.494  1.027
""")

>>> inp = qcel.models.AtomicInput(
    molecule=mol,
    driver="energy",
    model={"method": "SCF", "basis": "sto-3g"},
    keywords={"scf_type": "df"}
    )

These input specifications can be executed with the compute function along with a program specifier:

>>> ret = qcng.compute(inp, "psi4")

The results contain a complete record of the computation:

>>> ret.return_result
-74.45994963230625

>>> ret.properties.scf_dipole_moment
[0.0, 0.0, 0.6635967188869244]

>>> ret.provenance.cpu
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

See the documentation for more information.

License

BSD-3C. See the License File for more information.

qcengine's People

Contributors

Stargazers

Watchers

Forkers

loriab dgasmith jturney lnaden amjames ffangliu sjrl sjayellis dsirianni alongd nuwandesilva vivacebelles zachglick mattwelborn nstair bdice sinamostafanejad psi-rking dotsdl chemracer taylor-a-barnes muammar berquist exalearn awvwgk ahurta92 alexheide baritone-tut0211 jthorton jhrmnn nedaoj wardlt chemseddine-git saromleang cybermonitor peterkraus mattwthompson plin1112 farhadrgh eljost mrchemsoft bgpeyton qcmm maxscheurer hokru jeffschriber mtzgroup layeqa anabiman pk-organics lothian computational-chemistry-research cgbriggs99 nelsonblues quantumflow-open coltonbh pfizerrd fermiq python-repository-hub steliord maldil dr-marsmm yoshanuikabundi lgtm-migrator flamefire q-posev hjnpark cvsik mickaelzhao yueyericardo genesistherapeutics tosemml quchem fastflair chrinide jwaldrop107 kaka-zuumi tsatta ubcc3

qcengine's Issues

Bootcamp: Changeset 4

diff --git a/qcengine/programs/base.py b/qcengine/programs/base.py
index 5970827..9c0bcce 100644
--- a/qcengine/programs/base.py
+++ b/qcengine/programs/base.py
@@ -5,11 +5,8 @@
 from typing import Set
 from ..exceptions import InputError, ResourceError
 
-from .cfour import CFOURHarness
 from .dftd3 import DFTD3Harness
 from .entos import EntosHarness
-from .gamess import GAMESSHarness
-from .nwchem import NWChemHarness
 from .molpro import MolproHarness
 from .mopac import MopacHarness
 from .mp2d import MP2DHarness
@@ -97,7 +94,4 @@ def list_available_programs() -> Set[str]:
 register_program(DFTD3Harness())
 register_program(TeraChemHarness())
 register_program(MP2DHarness())
-#register_program(GAMESSHarness())
-#register_program(NWChemHarness())
-#register_program(CFOURHarness())
 register_program(EntosHarness())

I installed QCEngine on a machine that had Psi4 v1.2.1 installed through Conda, and it was failing the Psi4-related tests without flagging a recognizable error related to the incompatible Psi4 version.

program options beyond flat plain-old-data

At some point QCEngine will have to confront what options look like in non psi4/cfour/qchem-like programs. For example,

dft
  direct
  ...
end

is nwchem for boolean direct algorithm for dft. Another example is ESTATE=0/1/0/0 for an array variable in Cfour. even if the user knew the option and value they wanted (respectively, dft direct on and B1 state only in C2v), the settings of the keywords block in qcschema would be very different depending on whether they knew python best (dft_direct = True, estate=[0, 1, 0, 0]) or the target program domain language best. naturally, the input file must be formattable from the qcschema keywords dict.

My philosophy has been that the keyword RHS must be in natural python format (True, [0, 1, 0, 0]) and the LHS must be predictable by someone who knows the program DSL (domain specific lang) with double underscore being any module separator, so dft__direct and estate. That way, we’re only transforming, not making a new DSL. Somehow, will have to work Molpro into this.

This much, as I see it, is in the qcengine domain, not the qcdb (which is concerned with translating LHS options). Any concerns/disputes/that-doesn’t-belong-here-arguments before I act like this is qcng’s philosophy, too?

qcschema precedence is not fleshed out
On __-separated vs. nested dict: I used to use the nested dict but find the __ separator much easier on the user. Since nested dict is an intermediate in: __-sep-string --> nested-dict --> formatted-input, I can see allowing either at the qcng level.

DFTD3 Method error for incorrect atom distances

Describe the bug
This is more of a hunch than a hard error. In several cases where jobs were inputted into QCArchive in Angstrom rather than Bohr DFTD3's error return was "Unsuccessful run. Possibly -D variant not available in dftd3 version.".

Additional context
It would be good to run this and see if we can break out "atom too close" errors from variant errors.

`stdout` Logging Stream

An option for a stdout logging stream should be added instead of insisting on a file.

Add more tests to check different input types

Since the addition of the Pydantic Models from QCElemental, we will want to add more tests to ensure the different input types to (dict or Pydantic) are checked against.

Complex, ordered multi-command input for Molpro

Is your feature request related to a problem? Please describe.
Molpro input files are very complex, and depend on an imperative programming structure. For example, CCSD can only be performed after a HF command. This leads to very complicated input files, such as the following:

memory,300,m

symmetry,nosym
geometry=geo.xyz

basis=cc-pVDZ

gthresh,ENERGY=1e-10
{df-rhf,;save,2100.2}

{ibba,antibonds=1,THRLOC_IB=1e-12;freezecore;orbital,2100.2}
{locali,boys;print,orbital;fock,2100.2;order,type=fock}

{df-lccsd(t),chgfrac=1.0,canblk=0;core,6}

I want a way to specify this input file inside an AtomicInput.

Describe the solution you'd like
A format for specifying commands, in order. Each command must support args, kwargs, and directives. Directives themselves have args and kwargs. args and kwargs are not ordered. Directives are not ordered. (@sjrl please verify this last point).

One possible format would be:

keywords["commands"]: List[Commands]

class Directive: 
    name: str
    args: Optional[List[Union[str,int,bool,float]]]
    kwargs: Optional[Dict[str, Union[str,int,bool,float]]

class Command(Directive): 
    directives: Set[Directive]

Under this structure, the above input file would become:

method = "df-lccsd(t)"
basis = "cc-pVDZ"  # note that this replaces the basis command
molecule = Molecule.from_file("geo.xyz")
keywords = {
    "memory": 300,
    "symmetry": "nosym",
    "gthresh": "energy=1e-10",  # gthresh is weird, and this is a lame solution
    "commands": [
        {"name": "df-rhf", "directives": {{"name": "save", "args": ["2100.2"]}}},
        {
            "name": "ibba",
            "kwargs": {"antibonds": True, "thrloc_ib": 1e-12},
            "directives": {
                {"name": "freezecore"},
                {"name": "orbital", "args": ["2100.2"]},
            },
        },
        {
            "name": "locali",
            "args": ["boys"],
            "directives": {
                {"name": "print", "args": ["orbital"]},
                {"name": "fock", "args": ["2100.2"]},
                {"name": "order", "kwargs": {"type": "fock"}},
            },
        },
        {
            "name": "df-lccsd(t)",
            "kwargs": {"chgfrac": 1.0, "canblk": 0},
            "directives": {{"name": "core", "args": [6]},},
        },
    ],
}

Describe alternatives you've considered

Template files (current solution).
dunder representation of nested dictionaries. Will lead to extremely long strings, and does not have the necessary ordered property of commands.

@loriab @sjrl

Document program detection

The documentation section "Environment Detection" should provide an example of how to see which programs were detected. I think that qcengine.list_available_programs() does this? (Same goes for procedures.)

Testing import standardization

Currently in the tests there is both from qcengine.testing import ... and from QCEngine import testing. We should settle on a single strategy for code cleanliness. I would propose from qcengine.testing import ... so that this strategy works for both functions and fixtures.

NWChem CI

While NWChem isn't on conda, but it is on several package managers. From their docs:

Debian: https://packages.debian.org/search?keywords=nwchem
Ubuntu: https://launchpad.net/ubuntu/+source/nwchem
Fedora and EPEL: https://admin.fedoraproject.org/updates/search/nwchem
Good search engine for NWChem Linux packages: http://pkgs.org/search/nwchem

It would be good to install this package and run it through our tests on Travis.

psi4 CCSD(T) D1/D2 diagnostics not reported for open shell case

I use the following block of code to do CCSD(T) calculations

psi4_atom_task = qcelemental.models.ResultInput (
    molecule= mol,
    driver="energy",
    model= {"method": "ccsd", "basis": "6-31g"},
)
ret=qcengine.compute(psi4_atom_task, "psi4")

ret.dict()['extras']

If the mol is restricted shell, then the result in 'extras' includes non-zero values of the CC D1 DIAGNOSTIC and CC D2 DIAGNOSTIC. The example output of water molecule at singlet state is:

{'qcvars': {'-D ENERGY': 0.0,
  'CC D1 DIAGNOSTIC': 0.015061336005482073,
  'CC D2 DIAGNOSTIC': 0.12393617164619228,
  'CC NEW D1 DIAGNOSTIC': 0.015061336005482073,
  'CC T1 DIAGNOSTIC': 0.006755831075299291,
  'CCSD CORRELATION ENERGY': -0.13940696350922102,
  'CCSD OPPOSITE-SPIN CORRELATION ENERGY': -0.11488872459315766,
  'CCSD SAME-SPIN CORRELATION ENERGY': -0.02451823891606328,
  'CCSD TOTAL ENERGY': -76.1195648218788,

However, if the calculation is for an open shell system, CC D1 DIAGNOSTIC and CC D2 DIAGNOSTIC will always be zero regardless of molecule species. I suspect that QCEngine is not parsing out the D1 and D2 values for these calculation, resulting in zero values all the time.

Add list of programs currently supported in docs

QCEngine supports executing a variety of quantum chemistry, semiempirical, and molecular mechanics programs. The programs currently supported appear to be:

> from qcengine import list_all_programs
> list_all_programs()
{'dftd3',
 'entos',
 'molpro',
 'mopac',
 'mp2d',
 'psi4',
 'rdkit',
 'terachem',
 'torchani'}

The docs do not currently indicate what QCEngine supports. I propose we add a docs page with the list of currently-supported programs, possibly with comments as to what is and is not supported for each.

MOPAC Codecov

MOPAC Codecov is not being uploaded due to "git not found":

/bin/sh: git: command not found
/bin/sh: hg: command not found
/bin/sh: git: command not found
/bin/sh: hg: command not found

      _____          _
     / ____|        | |
    | |     ___   __| | ___  ___ _____   __
    | |    / _ \ / _  |/ _ \/ __/ _ \ \ / /
    | |___| (_) | (_| |  __/ (_| (_) \ V /
     \_____\___/ \____|\___|\___\___/ \_/
                                    v2.0.15

==> Detecting CI provider
    Error running `git rev-parse --abbrev-ref HEAD || hg branch`: None
  -> Got branch from git/hg
  -> Got sha from git/hg
==> Preparing upload
Error: Commit sha is missing. Please specify via --commit=:she

This used to work, so unsure what happened. @Lnaden any ideas?

https://dev.azure.com/MolSSI/QCArchive/_build/results?buildId=107&view=logs&j=0e9986bc-4438-57a6-7391-1704fabd60a9&t=9bc28805-ff26-57a6-76fd-c5967bb8a1e9

Add types to execute function of ProgramHarness base class

A reminder to add types to the execute function of the ProgramHarness base class and then propagate those changes to all Harnesses.

QCEngine/qcengine/programs/model.py

Line 61 in f76e107

    
           def execute(self, inputs, extra_outfiles=None, extra_commands=None, scratch_name=None, timeout=None):

TorchANI

Add TorchANI to QCEngine: https://github.com/aiqm/torchani

Implement DFT-D4 harness

Is your feature request related to a problem? Please describe.
The DFT-D4 dispersion correction has been recently published and should be made available for commonly used quantum chemistry packages.

Describe the solution you'd like
An integration via the dftd4 C-API or Python-API in QCEngine.

Describe alternatives you've considered
IO based integration would be possible but is not really desirable.

Additional context
The reference implementation of dftd4 is available here: https://github.com/dftd4/dftd4.
A C++ ported version of dftd4 is available (used for ORCA 4.2.0).
~~I am also volunteering to implement the harness in QCEngine.~~ (currently not enough time)

Update:
dftd4 is now available via conda-forge: https://anaconda.org/conda-forge/dftd4.

Python-based codes to add to QCEngine

Several codes to add which may be relatively easy to add are:

Compute:

PySCF

Optimizers:

Pydantic Options

Move to pydantic options rather than raw dict for two reasons:

Validation is automatic and error message will look the same across QCA ecosystem,
Merging options is straightforward for local options, #3.

tests installation

Somehow that manifest isn't sending the message that the programs/tests/ need to be installed. (Hence psi's unhappiness, as its tests borrow data from qcng.)

python-scripts/qcengine
site-packages/qcengine-0.7.0.dist-info/INSTALLER
site-packages/qcengine-0.7.0.dist-info/LICENSE
site-packages/qcengine-0.7.0.dist-info/METADATA
site-packages/qcengine-0.7.0.dist-info/RECORD
site-packages/qcengine-0.7.0.dist-info/WHEEL
site-packages/qcengine-0.7.0.dist-info/entry_points.txt
site-packages/qcengine-0.7.0.dist-info/top_level.txt
site-packages/qcengine/__init__.py
site-packages/qcengine/_version.py
site-packages/qcengine/cli.py
site-packages/qcengine/compute.py
site-packages/qcengine/config.py
site-packages/qcengine/exceptions.py
site-packages/qcengine/extras.py
site-packages/qcengine/procedures/__init__.py
site-packages/qcengine/procedures/base.py
site-packages/qcengine/procedures/geometric.py
site-packages/qcengine/procedures/model.py
site-packages/qcengine/programs/__init__.py
site-packages/qcengine/programs/base.py
site-packages/qcengine/programs/cfour/__init__.py
site-packages/qcengine/programs/cfour/harvester.py
site-packages/qcengine/programs/cfour/keywords.py
site-packages/qcengine/programs/cfour/runner.py
site-packages/qcengine/programs/dftd3.py
site-packages/qcengine/programs/empirical_dispersion_resources.py
site-packages/qcengine/programs/entos.py
site-packages/qcengine/programs/gamess/__init__.py
site-packages/qcengine/programs/gamess/harvester.py
site-packages/qcengine/programs/gamess/runner.py
site-packages/qcengine/programs/model.py
site-packages/qcengine/programs/molpro.py
site-packages/qcengine/programs/mp2d.py
site-packages/qcengine/programs/nwchem.py
site-packages/qcengine/programs/psi4.py
site-packages/qcengine/programs/rdkit.py
site-packages/qcengine/programs/terachem.py
site-packages/qcengine/programs/torchani.py
site-packages/qcengine/programs/util/__init__.py
site-packages/qcengine/programs/util/hessparse.py
site-packages/qcengine/programs/util/pdict.py
site-packages/qcengine/stock_mols.py
site-packages/qcengine/testing.py
site-packages/qcengine/tests/__init__.py
site-packages/qcengine/tests/test_config.py
site-packages/qcengine/tests/test_procedures.py
site-packages/qcengine/tests/test_program_utils.py
site-packages/qcengine/tests/test_standard_suite.py
site-packages/qcengine/units.py
site-packages/qcengine/util.py

nwchem exe available through zsh but qcengine doesn't notice it until switch to bash

My default shell has been zsh but qcengine doesn't recognize my nwchem executable until I switch my shell back to bash.

Specifically, when I run qcengine info programs in zsh, qcengine returns no available programs. When I run the same command in bash, qcengine returns nwchem as I expected.

MOPAC nthread trials

MOPAC sets nthreads internally based on the MKL_NUM_THREADS, we need to double check that this is being correctly passed down before deploying on HPC machines.

Bootcamp: Changesset 6

diff --git a/qcengine/programs/molpro.py b/qcengine/programs/molpro.py
index d618956..8054308 100644
--- a/qcengine/programs/molpro.py
+++ b/qcengine/programs/molpro.py
@@ -4,7 +4,7 @@ Calls the Molpro executable.

 import string
 import xml.etree.ElementTree as ET
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List, Set, Tuple, Optional

 from qcelemental.models import Result
 from qcelemental.util import parse_version, safe_version, which
@@ -26,7 +26,7 @@ class MolproHarness(ProgramHarness):
     version_cache: Dict[str, str] = {}

     # Set of implemented dft functionals in Molpro according to dfunc.registry (version 2019.2)
-    _dft_functionals = {
+    _dft_functionals: Set[str] = {
         "B86MGC", "B86R", "B86", "B88C", "B88", "B95", "B97DF", "B97RDF", "BR", "BRUEG", "BW", "CS1", "CS2",
         "DIRAC", "ECERFPBE", "ECERF", "EXACT", "EXERFPBE", "EXERF", "G96", "HCTH120", "HCTH147",
         "HCTH93", "HJSWPBEX", "LTA", "LYP", "M052XC", "M052XX", "M05C", "M05X", "M062XC",
@@ -49,9 +49,9 @@ class MolproHarness(ProgramHarness):
     }

     # Currently supported methods in QCEngine for Molpro
-    _scf_methods = {"HF", "RHF", "KS", "RKS"}
-    _post_hf_methods = {'MP2', 'CCSD', 'CCSD(T)'}
-    _supported_methods = {*_scf_methods, *_post_hf_methods}
+    _scf_methods: Set[str] = {"HF", "RHF", "KS", "RKS"}
+    _post_hf_methods: Set[str] = {'MP2', 'CCSD', 'CCSD(T)'}
+    _supported_methods: Set[str] = {*_scf_methods, *_post_hf_methods}

     class Config(ProgramHarness.Config):
         pass
@@ -113,10 +113,10 @@ class MolproHarness(ProgramHarness):
                 extra_infiles: Optional[List[str]] = None,
                 extra_outfiles: Optional[List[str]] = None,
                 as_binary: Optional[List[str]] = None,
-                extra_commands=None,
+                extra_commands: bool = None,
                 scratch_name: Optional[str] = None,
                 scratch_messy: bool = False,
-                timeout: Optional[int] = None):
+                timeout: Optional[int] = None) -> Tuple[bool, Dict[str, Any]]:
         """
         For option documentation go look at qcengine/util.execute
         """

standardize job name, commands

two items for a standardization pass:

(A) insofar as programs allow, should we standardize on a filename in qcng? e.g., qcengine_job.[in|out|inp|nw|mop] so it's clearer what are placeholders vs. runtime details.
(B) commands and extra_commands are deceptive in that while one can have arguments aplenty, qcng.util.execute is adamant about running only one command. propose singularizing for clarity.

GeomeTRIC does not add a provenance

Currently geomeTRIC does not provide a provenance. We should also add a check in our standard test suite for provenance info.

Virtual core allocations

Check virtual cores, and limited subscription of cores are correctly allocated. This often comes up on VM's and supercomputers where a single node is not fully given to individual jobs. Appears to work on Travis, AWS, and ARC.

version collection on Windows

Describe the bug
Idk if it's Azure, Windows, or general script echoing, but the usual version printing is incorporating the path, and then the safe_version madly join/hyphenates the result. I'll fix this for psi4, so this is an fyi should others hit bizarre versions.

    def get_version(self) -> str:
        self.found(raise_error=True)

        which_prog = which("psi4")
        print("v0:", which_prog)
        print("v1:", self.version_cache)
        with popen([which_prog, "--version"]) as exc:
            exc["proc"].wait(timeout=30)
        print("v2:", exc["stdout"])
        print("v3:", exc["stdout"].strip())
        print("v4:", safe_version(exc["stdout"]))
        if which_prog not in self.version_cache:
            with popen([which_prog, "--version"]) as exc:
                exc["proc"].wait(timeout=30)
            self.version_cache[which_prog] = safe_version(exc["stdout"])

2019-12-15T22:49:01.4454072Z v2: 
2019-12-15T22:49:01.4454364Z 
2019-12-15T22:49:01.4454695Z D:\a\1\b>C:/tools/miniconda3/python.exe D:\a\1\b\install\bin\psi4 --version 
2019-12-15T22:49:01.4454997Z 
2019-12-15T22:49:01.4455300Z 1.4a2.dev345
2019-12-15T22:49:01.4455611Z 
2019-12-15T22:49:01.4455891Z 
2019-12-15T22:49:01.4456222Z v3: D:\a\1\b>C:/tools/miniconda3/python.exe D:\a\1\b\install\bin\psi4 --version 
2019-12-15T22:49:01.4456545Z 
2019-12-15T22:49:01.4456851Z 1.4a2.dev345
2019-12-15T22:49:01.4457188Z v4: -D-a-1-b-C-tools-miniconda3-python.exe.D-a-1-b-install-bin-psi4.-version.-1.4a2.dev345-

Test returning as Pydantic objects and not just dicts

We only return dicts now, we will want to test the return of Pydantic objects as well, including when errors are thrown.

dftd3 only works on Unix systems

Pytests using dftd3 fail on non-Unix systems.

Generate `basis` if not specified in OpenMMHarness

During #151, it was decided that the basis field in AtomicInput for the OpenMMHarness need not be specified. When not specified, it would be generated from the contents of url or offxml. This is partially implemented, but in a fragile way that is not desirable, in OpenMMHarness._generate_basis.

Ideally, the basis contents would be generated as:

f"{forcefield_name}-{hash(forcefield_schema)}" for non-versioned forcefields
f"{forcefield_name}-{forcefield_version}" for versioned forcefields

However, there does not appear to exist a way to pull forcefield_version from the XML contents of an e.g. SMIRNOFF force field. This will likely require this addition to future releases of SMIRNOFF force fields.

Bootcamp: Changeset 1

diff --git a/qcengine/programs/nwchem/harvester.py b/qcengine/programs/nwchem/harvester.py
index fa3759d..273a896 100644
--- a/qcengine/programs/nwchem/harvester.py
+++ b/qcengine/programs/nwchem/harvester.py
@@ -1,11 +1,10 @@
 import re
 from decimal import Decimal
 
-import numpy as np
 import qcelemental as qcel
 from qcelemental.models import Molecule
 
-from ..util import load_hessian, PreservingDict
+from ..util import PreservingDict
 
 def harvest_output(outtext):
     """Function to separate portions of a NWChem output file *outtext*,

Standard QC test suite

A standard test suite for QC programs should be curated that run over several dimensions at a minimum:

Driver: energy/gradient/Hessian
Reference: UHF/RHF
Relevant procedures: optimization

Expanded `data` options for `run` and `run-procedure` CLI

CLI should allow data input in all serialization formats supported by QCElemental, namely:

JSON
MsgPack

run and run-procedure CLI commands should have the additional optional arguments:

--input-encoding (default: None). None would try everything; this would need to be implemented in qcelemental.basemodels.ProtoModel.parse_raw.
--output-encoding (default: JSON).

Error classification

It would be good to add error classification to these models so that can downstream programs can make decisions on what should happen. Several examples:

InputError - (non-recoverable) error in the user input (e.g., incorrect keyword or method)
SetupError - (recoverable) example: scratch directory is not writeable
ConvergenceError - (recoverable) likely requires options tweaking to enhance iterative convergence.
RandomError - (recoverable) random seg fault or the like.

Recoverable/non-recoverable in a distributed computing sense where an upstream manager can make the decision to resubmit.

My initial thought here is that we build these as Exception classes so that the compute command can capture them and either properly process them into proper JSON error message or let them raise. I am usually not a fan of custom error classes, but here is a good case where there is a variety of different behaviors that you want to elicit depending on the type of error provided.

It would be good to kick around the different error types for a bit before implementing.

CLI tests

Tests are needed for the CLI. Testing technology from QCFractal should be ported over.

Bootcamp: Changesset 5

diff --git a/qcengine/programs/molpro.py b/qcengine/programs/molpro.py
index d618956..8054308 100644
--- a/qcengine/programs/molpro.py
+++ b/qcengine/programs/molpro.py
@@ -4,7 +4,7 @@ Calls the Molpro executable.

 import string
 import xml.etree.ElementTree as ET
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List, Set, Tuple, Optional

 from qcelemental.models import Result
 from qcelemental.util import parse_version, safe_version, which
@@ -26,7 +26,7 @@ class MolproHarness(ProgramHarness):
     version_cache: Dict[str, str] = {}

     # Set of implemented dft functionals in Molpro according to dfunc.registry (version 2019.2)
-    _dft_functionals = {
+    _dft_functionals: Set[str] = {
         "B86MGC", "B86R", "B86", "B88C", "B88", "B95", "B97DF", "B97RDF", "BR", "BRUEG", "BW", "CS1", "CS2",
         "DIRAC", "ECERFPBE", "ECERF", "EXACT", "EXERFPBE", "EXERF", "G96", "HCTH120", "HCTH147",
         "HCTH93", "HJSWPBEX", "LTA", "LYP", "M052XC", "M052XX", "M05C", "M05X", "M062XC",
@@ -49,9 +49,9 @@ class MolproHarness(ProgramHarness):
     }

     # Currently supported methods in QCEngine for Molpro
-    _scf_methods = {"HF", "RHF", "KS", "RKS"}
-    _post_hf_methods = {'MP2', 'CCSD', 'CCSD(T)'}
-    _supported_methods = {*_scf_methods, *_post_hf_methods}
+    _scf_methods: Set[str] = {"HF", "RHF", "KS", "RKS"}
+    _post_hf_methods: Set[str] = {'MP2', 'CCSD', 'CCSD(T)'}
+    _supported_methods: Set[str] = {*_scf_methods, *_post_hf_methods}

     class Config(ProgramHarness.Config):
         pass
@@ -113,10 +113,10 @@ class MolproHarness(ProgramHarness):
                 extra_infiles: Optional[List[str]] = None,
                 extra_outfiles: Optional[List[str]] = None,
                 as_binary: Optional[List[str]] = None,
-                extra_commands=None,
+                extra_commands: bool = None,
                 scratch_name: Optional[str] = None,
                 scratch_messy: bool = False,
-                timeout: Optional[int] = None):
+                timeout: Optional[int] = None) -> Tuple[bool, Dict[str, Any]]:
         """
         For option documentation go look at qcengine/util.execute
         """

Found, not found, and incomplete programs

As was mentioned in #212, some programs require additional dependancies to function beyond the executable or Python project. Two examples are Orca (CCLib) and NWChem (networkx). The CLI support for this is fairly straightforward as shown in #212 by @loriab:

>>> Program information
Available programs:
dftd3 v3.2.1
mp2d v1.1

Incomplete programs (see devtools/conda-envs for install help):
nwchem -- needs networkx
orca -- needs cclib

Other supported programs:
cfour entos gamess molpro mopac nwchem openmm psi4 qchem rdkit terachem torchani turbomole

The main question is if we need to expand the found() syntax or some other resource to determine the difference between a "runnable" state and "found, but incomplete". A few proposals:

Leave found() as is and have a new function runnable() that includes dependancies in its checks.
Add an additional kwarg to found(include_dependancies=True).
Have found() return an enum or similar object that contains a variety of states True, found_executable, found_dependancies, None.

Per-computation settings

Settings are currently global are not allowed to be overridden on a per computation basis. Local options passed into a computation should be able to overwrite the global state to allow for more flexibility.

Option to communicate with programs via ramdisk

Disk I/O can become a bottleneck, for example in the case of MDI.

UnboundLocalError: local variable 'output_data' referenced before assignment

Describe the bug
Running jobs with qcfractal-manager, and came across this "failed" job:

[W 191116 19:55:18 managers:586] Job 369024 failed: unknown_error - Msg: QCEngine Execution Error:
    Traceback (most recent call last):
      File "/export/home/tgokey/opt/lib/python3.7/site-packages/qcengine-0.12.0-py3.7.egg/qcengine/util.py", line 74, in compute_wrapper
        yield metadata
      File "/export/home/tgokey/opt/lib/python3.7/site-packages/qcengine-0.12.0-py3.7.egg/qcengine/compute.py", line 86, in compute
        output_data = executor.compute(input_data, config)
      File "/export/home/tgokey/opt/lib/python3.7/site-packages/qcengine-0.12.0-py3.7.egg/qcengine/programs/psi4.py", line 134, in compute
        output_data["schema_name"] = "qcschema_output"
    UnboundLocalError: local variable 'output_data' referenced before assignment

There isn't anything else in the log regarding this job.

To Reproduce
Just spun up a manager... this is the failed job:

[D 191116 01:48:49 base_adapter:153] Submitted Task:
    {'id': '369024', 'spec': {'function': 'qcengine.compute', 'args': [{'molecule': {'symbols': ['C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'N', 'N', 'N', 'O', 'O', 'H', 'H', 'H', '
H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H'], 'geometry': [0.57426417, -1.76307953, -1.16386869, -1.47978368, -2.24619508, -2.75941109, 2.74413569, -3.24931665, -1.34716956, -1.33181318, -4.20770247, -4.5
1471714, 13.32191332, -8.35663934, -12.83410606, 11.44467574, -8.78436772, -11.03868877, 8.23420593, -6.55351199, -13.53594165, 2.90643258, -5.22572166, -3.10567323, 0.84445409, -5.70346844, -4.7099867, 12.65282185, -7.0102777, 
-15.00519621, 8.93197759, -7.90773264, -11.3598525, 10.15762671, -6.16505774, -15.27477129, 12.43132868, -5.06715346, -18.53910154, 4.67927175, -7.92177835, -9.23975043, 5.29529728, -6.8051532, -3.28146083, 0.9106814, -7.8429122
4, -6.62874052, 4.78079419, -9.28500297, -4.67875428, 12.84651209, -3.85373275, -21.03942941, 14.04207412, -6.26749508, -17.13523185, 3.46901028, -8.79729428, -7.06684771, 7.2380414, -8.46174168, -9.37478738, 3.53211844, -6.8187
0128, -10.93700297, 10.00778802, -4.91063299, -17.55305051, 0.48296456, -0.23916519, 0.21021945, -3.1766797, -1.09487934, -2.64203791, 4.34180367, -2.87932082, -0.1054238, -2.91940231, -4.57621612, -5.76946934, 15.23438845, -9.0
449066, -12.55556938, 11.91887755, -9.82707218, -9.3304528, 6.32496233, -5.88249497, -13.80579222, 6.76321643, -5.74408794, -4.29250307, 6.03371138, -7.19630468, -1.3859967, -0.26204768, -9.4189406, -5.96004396, 0.15060304, -7.2
1422316, -8.43483111, 6.5075239, -10.38278599, -4.9578624, 3.53561928, -10.48988482, -3.54591095, 11.61663072, -4.69734817, -22.47359554, 12.41252019, -1.83262451, -20.95148419, 14.81484793, -4.10679705, -21.59280923, 7.99605943
, -9.50891788, -7.97487656], 'masses': [12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 14.00307400443, 14.00307400443, 14.00307400443, 15.99491461957, 15.99491461957, 
1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.00782503223, 1.0
0782503223, 1.00782503223], 'real': [True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True,
 True, True, True, True, True, True, True, True], 'fragments': [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]], 'fragment_c
harges': [0.0], 'fragment_multiplicities': [1], 'schema_name': 'qcschema_molecule', 'schema_version': 2, 'name': 'C18H17N3O2', 'identifiers': {'molecule_hash': '56efe06472ee562b18994fa4b6334ff553f9dd23', 'molecular_formula': 'C1
8H17N3O2'}, 'comment': None, 'molecular_charge': 0.0, 'molecular_multiplicity': 1, 'atom_labels': ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '
', '', '', '', '', '', '', ''], 'atomic_numbers': [6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 8, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'mass_numbers': [12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
 12, 12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 16, 16, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'connectivity': [[0, 1, 1.0], [0, 2, 2.0], [0, 23, 1.0], [1, 3, 2.0], [1, 24, 1.0], [2, 7, 1.0], [2, 25, 1.0], [3, 8, 1.0],
 [3, 26, 1.0], [4, 5, 2.0], [4, 9, 1.0], [4, 27, 1.0], [5, 10, 1.0], [5, 28, 1.0], [6, 10, 2.0], [6, 11, 1.0], [6, 29, 1.0], [7, 8, 2.0], [7, 14, 1.0], [8, 15, 1.0], [9, 11, 2.0], [9, 18, 1.0], [10, 20, 1.0], [11, 22, 1.0], [12,
 17, 1.0], [12, 18, 2.0], [12, 22, 1.0], [13, 19, 1.0], [13, 20, 1.0], [13, 21, 2.0], [14, 16, 1.0], [14, 30, 1.0], [14, 31, 1.0], [15, 19, 1.0], [15, 32, 1.0], [15, 33, 1.0], [16, 19, 1.0], [16, 34, 1.0], [16, 35, 1.0], [17, 36
, 1.0], [17, 37, 1.0], [17, 38, 1.0], [20, 39, 1.0]], 'fix_com': True, 'fix_orientation': True, 'fix_symmetry': None, 'provenance': {'creator': 'QCElemental', 'version': 'v0.5.0', 'routine': 'qcelemental.molparse.from_schema'}, 
'id': '2847582', 'extras': None}, 'driver': 'hessian', 'model': {'method': 'b3lyp-d3bj', 'basis': 'dzvp'}, 'id': None, 'schema_name': 'qcschema_input', 'schema_version': 1, 'keywords': {'maxiter': 200, 'scf_properties': ['dipole
', 'quadrupole', 'wiberg_lowdin_indices', 'mayer_indices']}, 'extras': {'_qcfractal_tags': {'program': 'psi4', 'keywords': '2'}}, 'provenance': {'creator': 'QCElemental', 'version': 'v0.5.0', 'routine': 'qcelemental.models.resul
ts'}}, 'psi4'], 'kwargs': {'local_options': {'memory': 32.0, 'ncores': 16, 'scratch_directory': '/dev/shm', 'retries': 2}}}, 'parser': 'single', 'status': 'RUNNING', 'program': 'psi4', 'procedure': None, 'manager': 'OpenFF_Moble
y_HPC-gplogin2.gp.local-9a1f67ec-e1d2-40f6-88b4-50b87a37a760', 'priority': 0, 'tag': 'openff', 'base_result': {'ref': 'result', 'id': '4227774'}, 'error': None, 'modified_on': '2019-11-16T09:48:49.770283', 'created_on': '2019-08
-07T23:42:40.267434'

Additional context
Been running for ~24 hours, with 40 jobs completed no problem. This is the only job to do this. Using v0.12.0.

Geometry Optimization within QC Codes

Is your feature request related to a problem? Please describe.
I would like to use the optimizers baked in to QC codes to perform geometry optimizations. My concern that using geoMETRIC could be especially inefficient with MPI codes as it will call mpiexec very frequently.

Describe the solution you'd like
A Procedure that calls the geometry optimization for NWChem.

Describe alternatives you've considered

Adding a driver for geometry optimizations. However, the compute drivers seems to all not effect geometry
Modifying geoMETRIC/qcengine to preserve NWChem output files so it can read in restart files. I think this might be a better option than the "solution I'd like" as it could work with any code that writes restart files and allows me to use geoMETRIC's optimizer.

Additional context
I've got time to work on this, just need some direction :)

Bootcamp: Changeset 2

diff --git a/qcengine/programs/empirical_dispersion_resources.py b/qcengine/programs/empirical_dispersion_resources.py
index d760d38..ac5bbf9 100644
--- a/qcengine/programs/empirical_dispersion_resources.py
+++ b/qcengine/programs/empirical_dispersion_resources.py
@@ -557,8 +557,7 @@ def from_arrays(name_hint=None, level_hint=None, param_tweaks=None, dashcoeff_su
     elif dashlevel_candidate_1 is not None and dashlevel_candidate_2 is not None:
         if dashlevel_candidate_1 != dashlevel_candidate_2:
             raise InputError(
-                f"""Inconsistent -D correction level ({dashlevel_candidate_2} != {dashlevel_candidate_1}) from name_hint ({name_hint}) and level_hint ({level_hint})"""
-            )
+                f"""Inconsistent -D correction level ({dashlevel_candidate_2} != {dashlevel_candidate_1}) from name_hint ({name_hint}) and level_hint ({level_hint})""")
     dashleveleff = dashlevel_candidate_1 or dashlevel_candidate_2
 
     allowed_params = dashcoeff[dashleveleff]['default'].keys()
diff --git a/qcengine/programs/nwchem/harvester.py b/qcengine/programs/nwchem/harvester.py
index fa3759d..72433b8 100644
--- a/qcengine/programs/nwchem/harvester.py
+++ b/qcengine/programs/nwchem/harvester.py
@@ -574,7 +574,7 @@ def harvest_hessian(hess):
     Hess file name has to be "nwchem.hess". (default)
 
     """
-    hess = hess.splitlines()
+    raise NotImplementedError()
 
 
 def harvest(p4Mol, nwout, **largs):  #check orientation and scratch files

Turbomole support

Hi,
a request for a TURBOMOLE interface came up in this geomeTRIC issue.
As I already developed a TM-wrapper for my own code I'd be willing to work on an implementation in QCEngine.
Right now the way my wrapper works is quite incompatible to QCEngine I imagine... In my code you have to prepare a control-file beforehand (through the TM-utility define) that gets passed to the wrapper class. All contents from this directory get copied to a temporary directory, from where the calculation is actually run.

What is needed for QCEngine is probably a wrapper for define?!
What would you consider to the minimal feature-set that should be implemented?

Best regards
Johannes

Incorrect units for molecule xyz in Molpro input

Molpro reads in the molecule xyz in Angstrom by default and QCElemental provides the coordinates in bohr by default. Therefore, either the XYZ needs to be converted to Angstrom or somehow specify to Molpro that the XYZ is in bohr.

Bootcamp: Changeset 3

diff --git a/qcengine/programs/gamess/runner.py b/qcengine/programs/gamess/runner.py
index fd1f6ed..d284dc3 100644
--- a/qcengine/programs/gamess/runner.py
+++ b/qcengine/programs/gamess/runner.py
@@ -75,9 +75,9 @@ def build_input(self, input_model: 'ResultInput', config: 'JobConfig',
     def fake_input(self, input_model: 'ResultInput', config: 'JobConfig',
                     template: Optional[str] = None) -> Dict[str, Any]:
 
-# Note decr MEMORY=100000 to get
-# ***** ERROR: MEMORY REQUEST EXCEEDS AVAILABLE MEMORY
-# to test gms fail
+        # Note decr MEMORY=100000 to get
+        # ***** ERROR: MEMORY REQUEST EXCEEDS AVAILABLE MEMORY
+        # to test gms fail
         infile = \
 """ $CONTRL SCFTYP=ROHF MULT=3 RUNTYP=GRADIENT COORD=CART $END
  $SYSTEM TIMLIM=1 MEMORY=800000 $END
@@ -93,7 +93,6 @@ def fake_input(self, input_model: 'ResultInput', config: 'JobConfig',
 Hydrogen   1.0   -0.82884     0.7079   0.0
  $END
 """
-
         # edits to rungms
         # set SCR=./
         # set USERSCR=./
@@ -102,8 +101,7 @@ def fake_input(self, input_model: 'ResultInput', config: 'JobConfig',
         return {
             "commands": [which("rungms"), "gamess"],  # rungms JOB VERNO NCPUS >& JOB.log &
             "infiles": {
-                #"gamess.inp": infile,
-                "gamess.inp": input_model.extras['gamess.inp'],
+                "gamess.inp": infile
             },
             "scratch_directory": config.scratch_directory,
             "input_result": input_model.copy(deep=True),

Create OpenMM engine harness

We would like to be able to execute OpenMM workloads, if possible, using QCEngine. This should involve creating a qcengine.programs.model.ProgramHarness subclass. Use the existing *Harness implementations as inspiration.

This issue should define the scope for this implementation; it serves as the nexus for discussion on this addition.

List Known Packages

QCEngine should be able to list all currently found execution packages (Psi4, RDKit, etc).

Error Handling

QCEngine currently assumes that errors should be passed back upstream to be handled by the calling program. An option that should be added so that errors can be raised by compute and compute_procedure.

Traceback is not complete.

Currently, if yield in utils.compute_wrapper raises an exception, compute_procedure raises an uninformative UnboundLocalError and the error_message does not get printed out:

distributed.worker - WARNING -  Compute Failed
Function:  compute_procedure
args:      ({'schema_name': 'qc_schema_optimization_input', 'schema_version': 1, 'keywords': {'coordsys': 'tric', 'constraints': {'set': [{'type': 'dihedral', 'indices': [3, 5, 7, 6], 'value': '180,0'}, {'type': 'dihedral', 'indices': [5, 7, 6, 8]}]}, 'program': 'rdkit'}, 'qcfractal_tags': {'procedure': 'optimization', 'keywords': {'coordsys': 'tric', 'constraints': {'set': [{'type': 'dihedral', 'indices': [3, 5, 7, 6], 'value': '180,0'}, {'type': 'dihedral', 'indices': [5, 7, 6, 8]}]}, 'program': 'rdkit'}, 'program': 'geometric', 'qc_meta': {'driver': 'gradient', 'method': 'UFF', 'basis': '', 'options': None, 'program': 'rdkit'}, 'tag': None}, 'initial_molecule': {'symbols': ['C', 'C', 'C', 'C', 'C', 'C', 'C', 'N', 'O', 'H', 'H', 'H', 'H', 'H', 'H', 'H'], 'geometry': [1.5068158, 2.15098616, 0.22531301, 0.67956586, 0.88903407, 2.38658242, 0.82729471, 1.26190607, -2.16125162, -0.82724459, -1.26193986, 2.16127032, -0.67959486, -0.88903165, -2.38659572, -1.50683663, -2.15095462, -0.22531796, -3.8732
kwargs:    {}
Exception: UnboundLocalError("local variable 'output_data' referenced before assignment",)

Old enviromental variables

Still several old DQM style environmental variables to remap and depart such as DQM_CONFIG_PATH.

Cache `openmm_system` upon creation in `OpenMMHarness`

During #151, caching of the openmm_system was proposed as an optimization that could be valuable when we compute many gradients/energies for the same molecule in the same set of jobs. We currently cache the generated off_forcefield, so the same mechanism can be utilized for openmm_system.

The key/hash used for the cache must be selected with care. It must be insensitive to rotations or translations of the molecule, but should be sensitive to charge states, connectivity, and forcefield parameters. From @peastman:

One option for this is to serialize the System to XML with XmlSerializer.serialize(system) then compute a hash from the string. This will detect any change to the System or the Forces it contains, but will be unaffected by changes to particle positions.