xanaduai / graddft Goto Github PK

GradDFT is a JAX-based library enabling the differentiable design and experimentation of exchange-correlation functionals using machine learning techniques.

License: Apache License 2.0

Python 100.00%

deep-learning density-functional-theory differentiable-computing flax jax pyscf python quantum-chemistry scientific-computing

graddft's People

Contributors

Stargazers

Watchers

Forkers

irratzo qblockq ddffttd bluehope

graddft's Issues

There is still checkpointing and other files in the repo

running du -h . in the root of this repo returns:

 32K	./grad_dft/interface
 28K	./grad_dft/utils
1.5M	./grad_dft/external/density_functional_approximation_dm21/density_functional_approximation_dm21/checkpoints/DM21mu/variables
1.7M	./grad_dft/external/density_functional_approximation_dm21/density_functional_approximation_dm21/checkpoints/DM21mu
1.5M	./grad_dft/external/density_functional_approximation_dm21/density_functional_approximation_dm21/checkpoints/DM21mc/variables
1.7M	./grad_dft/external/density_functional_approximation_dm21/density_functional_approximation_dm21/checkpoints/DM21mc
1.5M	./grad_dft/external/density_functional_approximation_dm21/density_functional_approximation_dm21/checkpoints/DM21m/variables
1.7M	./grad_dft/external/density_functional_approximation_dm21/density_functional_approximation_dm21/checkpoints/DM21m
1.5M	./grad_dft/external/density_functional_approximation_dm21/density_functional_approximation_dm21/checkpoints/DM21/variables
1.7M	./grad_dft/external/density_functional_approximation_dm21/density_functional_approximation_dm21/checkpoints/DM21
6.7M	./grad_dft/external/density_functional_approximation_dm21/density_functional_approximation_dm21/checkpoints
6.8M	./grad_dft/external/density_functional_approximation_dm21/density_functional_approximation_dm21
 16K	./grad_dft/external/density_functional_approximation_dm21/cc
4.0K	./grad_dft/external/density_functional_approximation_dm21/.vscode
6.9M	./grad_dft/external/density_functional_approximation_dm21
6.9M	./grad_dft/external
7.2M	./grad_dft
 16K	./tests/unit
 48K	./tests/integration
 64K	./tests
1.5M	./models/DM21_model/variables
1.7M	./models/DM21_model
1.7M	./models
 64K	./image/README
 64K	./image
 20K	./examples/intermediate_examples
 48K	./examples/article_experiments
 32K	./examples/basic_examples
 28K	./examples/advanced_examples
344K	./examples
8.0K	./.github/workflows
8.0K	./.github
161M	./.git/objects/pack
  0B	./.git/objects/info
161M	./.git/objects
4.0K	./.git/info
4.0K	./.git/logs/refs/heads
4.0K	./.git/logs/refs/remotes/origin
4.0K	./.git/logs/refs/remotes
8.0K	./.git/logs/refs
 12K	./.git/logs
 60K	./.git/hooks
4.0K	./.git/refs/heads
  0B	./.git/refs/tags
4.0K	./.git/refs/remotes/origin
4.0K	./.git/refs/remotes
8.0K	./.git/refs
161M	./.git
 48K	./data/raw/dissociation
 84K	./data/raw
 84K	./data
171M	.

Question is @PabloAMC, do we need the checkpoints in the DM21 external folder? We certainly don't need the huge .pack objects in .git. I think these were committed by mistake. Typically, these would be in .gitignore

Testing the Non-XC part of the total energy

In #17, one of the points raised was about the correctness of the non-XC terms in the total energy.

Mostly, I was concerned with the inclusion of two body terms because these do not exist in DFT (outside the scope of the XC functional atleast). However, it appears that removal of these terms makes the energy completely wrong so I think my concern was related to a monomer in the language used in PySCF rather than any actual numerics being incorrect.

To make entirely sure that we are doing things correctly here, I am recommending implementing tests for the Non-XC total energy by checking the dissociation curves of 3 diatomic molecules match the result from PySCF. Runs with no XC functional can be run in PySCF using the dummy function:

mol = gto.M(
    atom = '''
    Li  0.   -0.37   0.
    F  0.   0.98    0. 
    ''',
    basis = 'ccpvdz')

def zero_xc(xc_code, rho, spin=0, relativity=0, deriv=1, omega=None, verbose=None):
    # A fictitious XC functional to demonstrate the usage
    rho0, dx, dy, dz = rho[:4]
    vlapl = None
    vtau = None
    fxc = None
    kxc = None
    vgamma = np.zeros(shape=rho0.shape)
    vrho = np.zeros(shape=rho0.shape)
    exc = np.zeros(shape=rho0.shape)
    vxc = (vrho, vgamma, vlapl, vtau)
    return exc, vxc, fxc, kxc

mf = dft.RKS(mol)
mf = mf.define_xc_(zero_xc, 'GGA')
truth = mf.kernel(max_cycle=500)

We can check for error (in the total energy) with our code by running:

HF_molecule = molecule_from_pyscf(mf, scf_iteration=500)
print(HF_molecule.nonXC() - truth)

A good benchmark for judging this error is chemical accuracy (1 mHa) but in reality I hope to see us with around 100x less error.

Simplify README.md

Presently, the README.md contains too much mathematics. It should just show the most simple and easy way to use the code and install instructions.

Adding install instructions

There exists no requirements.txt in ~ or install instructions in README.md.

In further issues, we should also consider/discuss making a setup.py and the other steps needed to make this package installable through PyPI.

Implement a less computationally demanding version of LYP

Right now GradDDFT implements the original expression for the LYP functional, eq 22 in

Lee, Yang & Parr
C. Lee, W. Yang, and R. G. Parr., Phys. Rev. B 37, 785 (1988) (doi: 10.1103/PhysRevB.37.785)

but there is a more computationally efficient one, eq 2 in

B. Miehlich, A. Savin, H. Stoll, and H. Preuss., Chem. Phys. Lett. 157, 200 (1989) (doi: 10.1016/0009-2614(89)87234-3)

This is low priority however.

Unifying notation between jitted and non-jitted orbital optimizers

Currently, the make_orbital_optimizer uses the argument max_cycles

GradDFT/grad_dft/evaluate.py

Line 596 in 73c6287

max_cycles: int = 500,

whereas make_jitted_orbital_optimizer uses the argument cycles.

GradDFT/grad_dft/evaluate.py

Line 782 in 73c6287

cycles: int = 500,

This causes a small mismatch in the results of some examples.

Make clipping constant consistent throughout the code

Make basic and intermediate examples into notebooks

Implement new loss functions in `train.py`

Presently, we have the "energy-only" loss in functional.py currently named default_loss. We should move this into train.py and call it "energy_loss". In total, we should have 3 loss functions:

energy_loss: can be used with self-consistent and non-self consistent training
density_loss: only for use in self-consistent training (and maybe Harris Fouled training if we implement it). Anything to do with density in non-self consistent training doesn't make sense as we never update the 1RDM so the predicted density is static in training.
energy_and_density_loss: For use in self-consistent training only for same reasons as 2.

There are some more considerations here as to what to do with the stop_gradient on the non-XC energy when using the different loss functions, but this will be handled in another issue.

We also have the "implied self consistency" approach I recommended some time ago. I've attached the whiteboard scrawling for this below:

Long story short, self consistency can be by passed during training if we calculate the total energy passing it a true many body density. You then match the predicted density one step forward in an SCF cycle to the true density (second term in the equation). This loss has the same minimum point as a full self consistent training.

Add caption to animation in README.md

Stable `eigh` for degenerate eigenvalues

When implementing self-consistent training methods, I thought I had fixed the NaN-gradients problem, but in reality, I only partly fixed it.

The core issue is that the reverse-mode Jacobian is undefined for degenerate eigenproblems. This is because in the eigenvector term for the gradients we have a term like: $1/(\lambda_i - \lambda_j)$ for eigenvalues $\lambda$ which is+inf for $\lambda_i = \lambda_j$.

I have implemented a "safe version" of the reverse-mode Jacobian which uses the Lorentzian broadening approach to mitigate this. I used the implementation suggested here, slightly modified for symmetric matrix inputs.

Broadening also only happens when eigenvalue differences are below some tolerance which is set by the user. Therefore, this method produces the same gradients as jnp.linalg.eigh for non-degenerate problems.

Design a pretty icon for the repo README.md

Discuss ideas for this.

I also think we should include a gif animation. Something cool like a heatmap or isosurface of the charge density evolving during training of a neural functional would be pretty neat.

We should come back to this one once more high priority issues are dealt with.

Make an example notebook displaying the different kind of loss functions

I want a notebook where we test using cost functions that:

(1) Train only the energy in a non self consistent way

(2) Train only the energy in a self-consistent way

(3) Train the energy and density in a way which enforces self consistency

The example will be for training on the binding curve of H_2 and testing on the binding curve of LiH. I also wish to generate a gif which shows the reduction of error in the LiH density at some bond length as a function of training epochs on H2. This will make a cool .gif animation for the repo README.md

Remove Pubchempy dependence from tests

This sometimes makes the tests fail when the webs-server is unavailable. We should just work with locally defined atomic coordinates in the tests

Re-implement tests using Pytest

Fix unstable B88 tests

the B88 tests sometimes pass and sometimes fail. Will trying fixing the grid level and fixing reducing the tolerance.

Do we want to make Grad DFT pip-listed and installable?

Numerical stability during autodifferentiation of eigh

In #17 we discussed that it would be quite important to implement the training self-consistently. Unfortunately, I was having lots of Nan-related issues in trying to differentiate throughout the self-consistent loop. With a simple example, I have been able to narrow down the problem to an external implementation of the generalized eigenvalue problem, provided in https://github.com/XanaduAI/DiffDFT/blob/main/grad_dft/external/eigh_impl.py and originally reported in https://gist.github.com/jackd/99e012090a56637b8dd8bb037374900e. The author also has some notes in https://jackd.github.io/posts/generalized-eig-jvp/.
The problem is that jax.scipy.linalg.eigh only supports the implementation when b = 0, even if it is not reported in the docs https://jax.readthedocs.io/en/latest/_autosummary/jax.scipy.linalg.eigh.html.
Substituting the line https://github.com/XanaduAI/DiffDFT/blob/da7d384fcb1bae547fa9370ae7846bf939a6d580/grad_dft/evaluate.py#L544 by anything else removes the error. For example, even if it nonsense, I can do

mo_energy = fock[:,:,0]
mo_coeff = fock + fock**2

which preserves the right matrix shapes, and retains dependency on the Fock matrix (so gradients are not 0). Then, for a very simple LDA model with a single layer etc, the gradients all look good even when the number of scf iterations is 35, though it breaks between 35 to 40 scf iterations.
I think solving this should be a high-priority problem now.

Bug in the calculation of the correlation energy

There seems to be a small error affecting the correlation energy of polarized systems. This affects VWN, PW92 and LYP functionals. In the first 2, the function https://github.com/XanaduAI/DiffDFT/blob/7cb9ac966371970ae3a1c42dc66f4a030fbbf15b/grad_dft/functional.py#L934
is used, which could be the root of one problem, while we might have to check LYP independently.

Add an "Open in colab" button to the jupyter notebook in the examples folder

Substitute PyTyping by JAXTyping

JAXTyping conveniently supports checking tensor dimensions everywhere. We should substitute PyTyping with JAXTyping.

Remove config file

The config file
https://github.com/XanaduAI/DiffDFT/blob/4269fbe88d44f824389f4274b207efe9a8e72666/grad_dft/config/config.json is not necessary and should be removed.

Implement a limited number of high priority unit tests

There are presently no unit tests at all in the repo. It is very likely that we will not have time to achieve high test coverage, but we should identify some key areas to test.

As I progress through understanding the code base internals, it will become more obvious which parts these are and we can discuss below.

The return of the NaN

I've noticed that for a few basis sets, NaN gradients are appearing again when trained using the DIIS SCF loops but not the linear mixing loops.

I think this is likely because degenerate eigenvectors/eigenvalues are being encountered in the calls to jnp.linalg.eigh routines. We can try switching these out for our custom safe_eigh which will hopefully fix the bug.

Make linear mixing SCF code Jittable

Much like the DIIS Jittable code, it would be nice to have the linear mixing code Jittable too. Linear mixing, while slower, is typically less prone to instability issues so it's good to have this in our tool box.

Reduce CI costs

MacOS tests in the CI are expensive. Let's just test with Ubuntu and change the actions triggering event to Pull requests and not pushes.

Add license and licensing strings to python modules

Proceeding with apache 2.0, we should include this license in the root directory and place the below strings in the header of all source files:

# Copyright 2023 Xanadu Quantum Technologies Inc.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

#     http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Sharding the computation between GPUs

Add sharding following https://jax.readthedocs.io/en/latest/notebooks/Distributed_arrays_and_automatic_parallelization.html

Add periodic boundary conditions

Currently Grad DFT only supports molecules. Add the capability to add periodic boundary conditions.

Formatting and linting

Xanadu open sourcing policy requires we lint the code. I'm a fan of Python Black so unless anybody is opposed, I will use this tool.

I will also add a precommit hook:

#!/bin/sh

set -e

files=`git diff --staged --name-only --diff-filter=d -- "*.py"`

for file in $files; do
  black $file
  git add $file
done

To ensure that new commits are always formatted.

Give credit to PySCF in accordance to licensing

PySCF is provided with an Apache 2.0 license and we have translated a large portion of this into JAX. We should therefore acknowledge this in accordance to the software licence where is needed.

Replace the coulomb potential calculation by the function

Right now we sometimes compute the coulomb potential independently of the corresponding function
https://github.com/XanaduAI/DiffDFT/blob/4269fbe88d44f824389f4274b207efe9a8e72666/grad_dft/molecule.py#L646-L648
defined in
https://github.com/XanaduAI/DiffDFT/blob/4269fbe88d44f824389f4274b207efe9a8e72666/grad_dft/molecule.py#L661

Make a differentiable SCF procedure

Presently, a single iteration of the SCF procedure is now fully differentiable and produces stable gradients.

However, when we update the relevant parameters in a new iteration of the loop using the logic of some SCF procedure (like DIIS for example), gradient computations fail as they are not presently fully differentiable in the code.

DIIS is a fairly advanced SCF iterator as they go, so we can try something much simpler like linear mixing. I.e, update the charge density (equivalent the RDM) each iteration like:

$$ \rho_{out} = (1 - \alpha) \rho_{old} + \alpha \rho_{new} $$

where we pick some $0 \leq \alpha \leq 1$

This is arguably the most stable and reliable (but slow) way of performing the SCF procedure.

Integration with Jax-XC

Jax-XC allows to convert LibXC functionals to JAX. It would be cool to allow integration with our library, I think.

Run install and existing tests in CI

Doc strings need to be implemented where missing

This issue is self explanatory. We need high quality docstrings in the code.

They exist in some places already but not everywhere. molecule.py is missing a lot of these for example.

Run existing tests and install in the CI

Removing stop_gradient for the non XC energy calculation

Study whether this line is correct and how to correct it
https://github.com/XanaduAI/DiffDFT/blob/2bf2b94665ca97d461258e758aa086d0249bd104/grad_dft/functional.py#L277

Figure out Google collab integration for example notebooks

We've added ope in collab buttons, but we need a Grad DFT environment on collab. I will figure this out.

Understand/fix existing integration tests

many of the tests in ~tests fail in their assertions. We need to understand why this is and fix it.

Correctness checks

Grad-DFT achieves something fairly complex. That along side the fact that (i) development occurred without any unit testing and (ii) it has not been exposed to a large number of users means that the probability that things (and perhaps even important things) are wrong is quite high.

In this issue, I will pass comments/questions about things I either don't understand or believe are incorrect. If we agree that incorrectness is present, this will be raised in separate issues and corrected.

Fix failing examples

A number of the examples in the ~examples directory do not finish without errors.

More details will be given below on a module by module basis.

Have `molecule_predictor` optionally return densities as well as energies

Presently, molecule_predictor predicts an energy and Fock matrix given a 1RDM. If we are running in a training mode where densities are being trained, we can also predict a density with a simple call to molecule.density().

Repo memory is high and contains things that should probably be elsewhere.

Running

du -h ~/.

shows:

533M	./DiffDFT

This is very large for a code repo. Refining the source of the large amount of data, it seems most is in:

~/checkpoint_dimers

and

~/ckpt_dimers

I assume these are the raw results of experiments. They should be moved elsewhere (zenodo once paper is released) as this repo is for the code.

Implement the Harris-Foulkes energy

In #17, a concern raised was to do with the usage of self-consistency in training. While I think we should still look into implementing this, a good work around is to use the Harris-Foulkes Functional instead of the energy functional.

When self consistency has not been reached, this gives a much better estimate of the electronic energy than the DFT energy functional.

I cannot see if this is implemented in PySCF though so could take a little more thought.

Useful functionality should be loadable from top-level module

It would be nice to move from this import:

from grad_dft.molecule import Molecule

Molecule(...)

import grad_dft as gdft

gdft.Molecule(...)

Remove all the from grad_dft.file import function, particularly from examples

In the paper Juan Miguel requests that all the lines of the kind

from grad_dft.file import function

may be removed.

Demonstrate parallel execution of a loss function

On an HPC cluster, each term in a mean square loss can be calculated using embarrassingly parallel logic.

Unfortunately, the native way of doing this with jax (using jax.vmap and jax.pmap) is not compatible with input we must parallelize over: the Molecule object. This is because its data is stored in "ragged" structure. I.e., the dimensions of the grid for one molecule are very often different from the grid for another and the dimensions of the 1-RDM for one molecule is different for another: jnp.array([rdm1_1, rdm1_2]) will not work.

This means that for loss parallelism, we need to think differently. Sharding may be the way forward, but this requires more thought. A good reference is here: https://jax.readthedocs.io/en/latest/notebooks/Distributed_arrays_and_automatic_parallelization.html

I don't think we will get around to solving this problem before our release deadline, but if we want to do something with HPC, getting this right is non-negotiable.

Rename some of the functions

We should define better naming for some functions, including molecule_predictor.

Implementing a documentation method

While I can see a few examples for using the code, there is no formal ~/docs folder.

What we use for this will depend on need, but I am most familiar in setting this up with Sphinx. We can then consider hosting the sphinx docs at the gitHub.io we address associated with the future public repository.