Code Monkey home page Code Monkey logo

Comments (39)

leeping avatar leeping commented on August 26, 2024 2

I think this might be @jmaat 's project, not @vtlim 's but correct me if I'm wrong.

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024 1

Hmm, that didn't show up for me originally. I have it now.

Edit: Yea for 2D we are talking about thousands of gradient evaluations. Let's parallelize that.

from qcfractal.

loriab avatar loriab commented on August 26, 2024 1

Ok, good! Yes, it's in psi4. Medium term, it'll move to the common driver repo, but it'll always be available in a pure-python psi4/qca repo. To keep the topic in one place, here's a test case. Ping me with questions if you ever want to use it.

from qcfractal.

leeping avatar leeping commented on August 26, 2024

Here is what I'm imagining:

  1. One of the optimizations on the grid will be run first, starting from the user-provided structure, and all the others would depend on a previous one. There is a dependence graph structure to the calculations (see image).
  2. The user should be able to provide an option called absolute or relative. In the case of absolute, the first optimization is constrained to the nearest grid point, and the constraint values are absolute. In the case of relative, the first optimization is fully relaxed, and the constraint values are relative to the first optimized structure.
  3. The user should provide a list of lists of constraint values, starting with the leading dimension.

wechatimg2

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

It might be good to start with the 1D case and move to 2D.

In general I think this should likely support n-dimensions and arbitrary combinations of bond/angle/torsion scans?

Looking at the potential cost we may consider driving this as a service just so that we can checkpoint the jobs.

from qcfractal.

leeping avatar leeping commented on August 26, 2024

Sure, I agree starting from 1D is a good idea. Jessica Maat in the Mobley group is already starting to generate and use 2D data so it would be optimal to support that as well.

  • Lee-Ping

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

Apologies, I meant 1D when describing the spec.

What we are really after her is how should access this grid data? It seems that you have primary and trailing axes, but for example you likely do not want to access that did as something like optimization_grid[0][0]. This one isn't nearly as obvious to as a single optimization trajectory or a TorsionDrive as those have well defined labels.

Should each part of the resulting grid just be a single Optimization object, or do we want some metadata there as well? The best way to get at this layout is through use cases, so it might be worth thinking about how you would like to access the data.

For TorsionDrives we a function like:

>>> torsiondrive.list_final_energies()
{"(-180, )" : -10, "(180, )": 10} # key: energy

>>> torsiondrive.list_final_results()
{"(-180, )" : ResultsObject, "(180, )": ResultsObject} # key: full results object

where the ResultsObjects have everything from final energies to Wiberg Bond orders.

What would be analogous here?

from qcfractal.

leeping avatar leeping commented on August 26, 2024

I think each part of the resulting grid should be an Optimization object; currently drawing a blank on what other data might be needed at the individual grid points.

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

Right, so each will be an Optimization object (we will have shortcuts to energy etc). But what is the label for each index? For torsiondrive the torsions had to be integers so that worked out as keys, here we will have floats for bonds and perhaps angles (I believe) so we will need a key like (0, 0). How will the user understand at what the distance was frozen at for the bond? In this grid which was the previous computation etc?

Hopefully this is helping, we can schedule a call soon if not!

Pinging @vtlim as well.

from qcfractal.

leeping avatar leeping commented on August 26, 2024

This definitely helps. One possibility is we add a method called GridOptimization.get_constraint_values() and that will return a dictionary that maps keys like (0, 0) to values like (1.81, 150.0). What do you think?

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

Sure, we can do that. In that case we will also need some kind of description of the what each dimension means in that case. How would we specify this?

grid = [{"type": "bond", "range": [1.1, 1.2], "step": 1.e5}]

What other kind of metadata will we need here?

from qcfractal.

leeping avatar leeping commented on August 26, 2024

I think that having unevenly spaced points (within a dimension) could be useful in certain use cases other than ours (for example, if you were generating structures for interaction energy calculations). Could we provide the full list of values for each dimension in the output?

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

Sure we would need to differentiate between scan and perhaps another keyword that explicitly defines a list, perhaps enumerate?

We can add a category or something like this which will differentiate the two. Keywords need some overhaul, but a feasible concept.

from qcfractal.

leeping avatar leeping commented on August 26, 2024

I would be happy with having just enumerate or both. The advantage of enumerate is that you could see the numbers themselves, e.g. 0.0 0.1 0.2 0.3 0.4 0.5. With geomeTRIC's scan I found it's easy to generate grid points that are different from the original intent, e.g. a grid of 10 points from 0.0 to 1.0 would end up giving 0.0 0.1111 0.2222 0.3333 ... 0.8888 1.0.

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

In that case we might just have:

grid = [{"type": "bond", "indices": [0, 1], "steps": [1.0, 1.01, 1.02, 1.1]}, ...]

Where the result would be the Cartesian product between all elements in the grid. Is it possible you want to do sparse grids?

Looking at your diagram again, we may be able to parallelize this where say if I am doing a 2D scan I can do grid (0,0) then spawn (0, 1) and (1, 0). Is this true?

from qcfractal.

leeping avatar leeping commented on August 26, 2024

I have no plans to do sparse grids at the moment. I could imagine some applications, for example if we wanted to scan the conformations of a ring (that sort of gets into torsiondrive territory).

(The above is not truly a sparse grid, but an example where we might want to provide a list of grid points, as one would in a sparse grid.)

I think that in a 2D grid that's a Cartesian product between two 1D grids, the maximum number of jobs that could run in parallel is 2*len(leading_dimension).

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

Interesting ok, between the possible parallelization and the snapshotting we might try to implement this as a service.

I think this is mostly what I need to give it a first pass. @vtlim do you have a small test case (~5 heavy atoms) for 1 and 2D scans you might want?

from qcfractal.

leeping avatar leeping commented on August 26, 2024

scan-1D.xyz: Example of a 1D scan of the improper dihedral angle involving indices 3, 1, 0, 12. 20 points from -40 to +40 degrees. (I would personally have used 21 points :)

scan-2D.xyz: Leading dimension is the improper dihedral angle involving indices 3, 1, 0, 12; 8 points from 0 to -70 degrees. Trailing dimension is the angle involving atoms 0, 3, 1; 11 points from 100 to 150 degrees.

Archive.zip

from qcfractal.

leeping avatar leeping commented on August 26, 2024

geomeTRIC's scan feature is only able to start from one corner of the grid, and it only supports "absolute" and not "relative" constraint values. Thus, the GridOptimization feature request should generate slightly different (and superior) results compared to the ones I uploaded above.

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

Ah, I apologize if I switched the projects! Do you have a specific molecule in mind?

from qcfractal.

leeping avatar leeping commented on August 26, 2024

No problem, the molecule from @jmaat is provided in the zipped file above. :)

from qcfractal.

vtlim avatar vtlim commented on August 26, 2024

Yes, this is @jmaat’s project.

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

@jmaat @leeping Will the optimization method for the grid and the initial optimization ever differ or should we trust that these two be the same?

Allowing them to be different isn't hard to do, but it doesn't seem like it should be something that we should allow in general.

from qcfractal.

leeping avatar leeping commented on August 26, 2024

I don't think the optimization methods for grid and initial optimization will ever be different (save for different constraints; the grid will have more).

I do think it will be helpful to support "freeze" or "scan" constraints for both the initial optimization and the grid, what do you think? These constraints should be different from those that are being scanned on the grid.

from qcfractal.

jmaat avatar jmaat commented on August 26, 2024

@dgasmith

This is a follow up from our meeting yesterday. For the improper project, in immediate we need (1) optimized geometries and (2) energies. We also eventually will need (3) Wiberg bond order as output. We also will need (4) all of the quantities @leeping needs for valence parameters, such as vibrational frequencies.

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

Great thanks!

Wiberg is no problem, but the vibrational frequencies will take a bit more time (though we could write a simple wrapper to generated these from Hessians).

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

@leeping When running over this grid we have shown two ways to do this:

  1. Starting along the first dimension and then propagating along the second dimension as the first completes:
 O -> O -> O -> X
 | -> O -> X
 O -> X
 |
  1. Starting along the first dimension and then propagating front he middle of the second (similar to the hand drawing):

 X <- O -> X
      |
      X

Effectively, do we need to allow for two directions from a single starting point, or just one? The first is pretty easy to code up since we just have lists to loop over:

grid = [{"type": "bond", "indices": [0, 1], "steps": [1.0, 1.01, 1.02, 1.1]},
        {"type": "dihedral", "indices": [0, 1, 2, 3], "steps": [110, 125, 130, 125]}]

Note the steps list is not necessarily monotonic or uniform (note this is both good and bad, do we need to check this?). Is this bidirectionally something we should consider?

from qcfractal.

loriab avatar loriab commented on August 26, 2024

though your use case probably has additional complications, here's some frequencies-from-Hessian code in python, in case it's handy: https://github.com/psi4/psi4/blob/master/psi4/driver/qcdb/vib.py#L307.

from qcfractal.

leeping avatar leeping commented on August 26, 2024

I think the grid points in each dimension should be sorted prior to starting the procedure. The results could be returned to the user in sorted order. If the user originally provided non-monotonic inputs, then they could recover the original order later on. At the moment I don't see the use case for non-monotonic values on the grid.

If a repeated value occurs in the steps field, an error should probably be thrown (as shown above, 125 is repeated in the dihedral angle).

Constrained optimizations converge fastest when you start close to satisfying the constraint, so I think it is best to start at whichever grid point is closest to the un-constrained optimized geometry. Most often this means the propagation should start from somewhere in the middle, though not necessarily the exact center. I also thought it'd be useful if the user could specify absolute vs. relative, where the grid points provided in relative are offsets from the un-constrained optimized value. For example:

grid = [{"type": "bond", "indices": [0, 1], "values": "relative", "steps": [-0.1, -0.05, 0, 0.05, 0.1]}]

from qcfractal.

leeping avatar leeping commented on August 26, 2024

Thank you @loriab , that is very helpful. :) The harmonic analysis we want should be rather simple and I don't foresee any complications with our use cases. If this code is part of Psi4, then I think we could simply run it on the Hessians downloaded from QCArchive and obtain the frequencies / vibrational modes that we need.

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

@leeping Ok, that makes sense and brings together several other questions. Will progress in that direction.

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

From #141 and #152 I believe this is effectively finished. Thank you everyone for the comments and discussion on the structure of the service.

The current input structure looks like the following:

service = GridOptimizationInput(**{
    "gridoptimization_meta": {
        "starting_grid":
        "zero",
        "scans": [{
            "type": "distance",
            "indices": [1, 2],
            "steps": [1.0, 1.1]
        }, {
            "type": "dihedral",
            "indices": [0, 1, 2, 3],
            "steps": [0, 90]
        }]
    },
    "optimization_meta": {
        "program": "geometric",
        "coordsys": "tric",
    },
    "qc_meta": {
        "driver": "gradient",
        "method": "UFF",
        "basis": "",
        "options": None,
        "program": "rdkit",
    },
    "initial_molecule": mol_ret["hooh"],
})

ret = client.add_service(service)

I will be reaching out in about a week to see to begin integrating and trailing this code.

from qcfractal.

leeping avatar leeping commented on August 26, 2024

Hello Daniel,

Thank you so much for implementing this. It will be highly useful for our valence parameterizations.

Two questions:

  1. What is the meaning of "starting_grid":"zero"?
  2. Is the user able to specify whether to run an initial fully unconstrained optimization?
  • Lee-Ping

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

Right now we have two guesses:

  • zero starts at simply the initial zero grid point.
  • 'relative' starts at the closest point to the input molecule.

I had forgotten about that case, I guess we should make add one more option that does this, perhaps optimized_relative?

from qcfractal.

leeping avatar leeping commented on August 26, 2024

Thanks for the explanation, there's a chance we may be thinking about this differently. My initial plan had two separate choices:

Choice LP-1: The user decides whether the service should run an un-constrained optimization as an initial step, or use the provided structure directly. Call this result structure "A".

Choice LP-2: Whether the grid points are "absolute" geometric parameters or "relative / displacements" with respect to structure A. For example, suppose grid points [-0.1 0.0 0.1] are provided for an interatomic distance. This should makes sense for relative but not absolute.

Choice DS-1: If I understand correctly, your zero starts at the first / leftmost grid point in each dimension and your relative starts at the closest point to the initial molecule, i.e. structure "A". I assumed that we would always be making the latter choice, because the performance of constrained geometry optimization is poor when starting from a structure far from constraint satisfaction. However, I also see the benefit of having this be a choice, so it's up to you.

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

I was thinking about this slightly differently and didn't quite realize that relative was with respect to optimized parameters. This makes sense and something we can work in pretty easily. I think we will make this a per-scan dimension option for lots of flexibility. Should we default to one option or make the user provide this explicitly every time?

from qcfractal.

leeping avatar leeping commented on August 26, 2024

Thanks, I think having this be a per-dimension option is a good idea.

I prefer for the user to need to make this choice every time. There are relative benefits and drawbacks to using absolute vs. relative, and it plays a role in defining the grid points.

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

Ok, this should be implemented in #161. This proved to be a good test and revealed several bugs in the ecosystem. I think we should be good to go now!

from qcfractal.

dgasmith avatar dgasmith commented on August 26, 2024

Closing this issue, please open new ones if GridOptimization needs additional tweaks.

from qcfractal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.