Code Monkey home page Code Monkey logo

reann's Introduction

Recursively embedded atom neural network

Introduction

Recursively embedded atom neural network (REANN) is a PyTorch-based end-to-end multi-functional Deep Neural Network Package for Molecular, Reactive and Periodic Systems. Currently, REANN can be used to train interatomic potentials, dipole moments, transition dipole moments, and polarizabilities. Taking advantage of Distributed DataParallel features embedded in PyTorch, the training process is highly parallelized on both GPU and CPU. For the convenience of MD simulation, an interface to LAMMPS has been constructed by creating a new pair_style invoking this representation for highly efficient MD simulations. In addition, REANN have been interfaced to ASE package as a calculator. More detials can be found on the manua

Field-induced REANN (FIREANN) developed based on the REANN package and can describes the response of the potential energy to an external field up to an arbitrary order (dipole moments, polarizabilities …) in a unified framework.

Requirements

  1. PyTorch 2.0.0
  2. LibTorch 2.0.0
  3. cmake 3.1.0
  4. opt_einsum 3.2.0

Data sample

The REANN package has been embedded in GDPy, which is used to search the configuration space and sample suitable configurations to construct machine learning potential functions.

Training Workflow

The training process can be divided into four parts: information loading, initialization, dataloader and optimization. First, the "src.read" will load the information about the systems and NN structures from the dataset and input files (“input_nn” and “input_density”) respectivrly. Second, the "run.train" module utilizes the loaded information to initialize various classes, including property calculator, dataloader, and optimizer. For each process, an additional thread will be activated in the "src.dataloader" module to prefetch data from CPU to GPU in an asynchronous manner. Meanwhile, the optimization will be activated in the "src.optimize" module once the first set of data is transferred to the GPU. During optimization, a learning rate scheduler, namely "ReduceLROnPlateau" provided by PyTorch, is used to decay the learning rate. Training is stopped when the learning rate drops below "end_lr" and the model that performs best on the validation set is saved for further investigation. image

How to Use REANN Package

Users can employ geometries, energies, atomic force vectors (or some other physical properties which are invariant under rigid translation, rotation, and permutation of identical atoms and their corresponding gradients) to construct a model. There are three routines to use this package:

  1. Prepare the environment
  2. Prepare data
  3. Set up parameters

Prepare the environment

The REANN Package is built based on PyTorch and uses the "opt_einsum" package for optimizing einsum-like expressions frequently used in the calculation of the embedded density. In order to run the REANN package, users need to install PyTorch (version: 2.0.0) based on the instructions on the PyTorch official site and the package named opt_einsum.

Prepare data

There are two directories that users need to prepare, namely, “train” and “val”, each of which includes a file “configuration” used to preserve the required information including lattice parameters, periodic boundary conditions, configurations, energy and atomic forces (if needed), dipole moments, polarizabilities, etc. For example, users want to represent the NMA system that has available atomic forces. The file "configuration" should be written in the following format.image The first line can be an arbitrary character other than a blank line. The next three lines are the lattice vectors defining the unit cell of the system. The fifth line is used to enable(1)/disable(0) the periodic boundary conditions in each direction. In this example, NMA is not a periodic system, the fifth line should be “pbc 0 0 0”. For some gas-surface systems, only the x-y plane is periodic and the corresponding fifth line is “pbc 1 1 0”. Following N lines (N is the number of atoms in the system, here is 12): the columns from the left to right represent the atomic name, relative atomic mass, coordinates(x, y, z) of the geometry, atomic force vectors (if the force vector is not incorporated in the training, these three columns can be omitted). Next line: Start with "abprop:" and then follow bytarget property (energy/dipole/polzrizability). One example is stored in "data" folder.

Set up parameters

In the section, we will introduce some hyparameters concerning the embedded density and NN structures that are essential for obtianing an exact representation. More detailed introduction of all parameters can be found on the manual in the "manual" floder. These parameters are set up in two files "input_nn" and "input_density" saved in the "para" floder of your work directory. one example of input_nn and input_density is placed in the "example" foleder.

input_nn

  1. batchsize_train=64 # required parameters type: integer
  2. batchsize_val=128 # required parameters type: integer (Number of configurations used in each batch for train (batchsize_train) and validation (batchsize_val). Note, this "batchsize_train" is a key parameter concerned with efficiency. Normally, a large enough value is given to achieve high usage of the GPU and lead to higher efficiency in training if you have sufficient data. However, for small training data, a large "batchsize" can lead to a decrease in accuracy, probably owing to the decrease in the number of gradient descents during each epoch. The decrease in accuracy may be compensated by more epochs (increase the "patience_epoch" ) or a larger learning rate. Some detailed testing is required here to achieve a balance of accuracy and efficiency in training. The value of "batch_val" has no effect on accuracy, and thus a larger value is preferred.)
  3. oc_loop = 1 # type: integer (Number of iterations used to represent the orbital efficients.)

input_density

  1. cutoff = 4.5 # type: real number (Cutoff distances)
  2. nipsin= 2 # type: integer (Maximal angular momenta determine the orbital type (s, p, d ..))
  3. nwave=8 # type: integer (Number of radial Gaussian functions. This number should be a power of 2 for better efficiency.)

MD simulations

As mentioned earlier, the package interfaces with the LAMMPS framework by creating a new pair_style (fireann).MD simulations can be run in a multi-process or multi-threaded fashion on both GPUs and CPUs. MD simulations based on other MD packages such as i-pi can also be executed through the existing ipi-lammps interface. In addition, MD simulation can also be performed by the ASE interface. More details can be found in the manual.

ASE interface

In the “ASE” folder, there is a Python script named “ase_reann.py.”which serves as an example for calculating energy and atomic forces by utilizing the model saved in “PES.pt”. Note that the “atomtype” script used in the inference should match that found in the “input_density” file. In this interface, we do not use the default ASE neighbor list calculator. Instead, we employ a highly efficient Fortran implementation of a cell-linked algorithm to construct the neighbor list. To compile the Fortran code, f2py should be utilized, which generates a dynamic link library when executing the provided “run” script. This resulting dynamic link library can then be called by ASE or any Python-based evaluator.

References

If you use this package, please cite these works.

  1. The original EANN model: Yaolong Zhang, Ce Hu and Bin Jiang J. Phys. Chem. Lett. 10, 4962-4967 (2019).
  2. The EANN model for dipole/transition dipole/polarizability: Yaolong Zhang Sheng Ye, Jinxiao Zhang, Jun Jiang and Bin Jiang J. Phys. Chem. B 124, 7284–7290 (2020).
  3. The theory of REANN model: Yaolong Zhang, Junfan Xia and Bin Jiang Phys. Rev. Lett. 127, 156002 (2021).
  4. The details about the implementation of REANN: Yaolong Zhang, Junfan Xia and Bin Jiang J. Chem. Phys. 156, 114801 (2022).

reann's People

Contributors

chen-jialan avatar zhangylch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

reann's Issues

Can't use ASE interface

Dear developers,

thanks for sharing this project. I'm having trouble to use the ASE interface.

I'm trying to use the example script $REANN_FOLDER/ASE/test/ase_eann.py provided as example but is gives the following error:

Atoms(symbols='Cu8Ce16O32', pbc=True, cell=[[15.304599762, 0.0, 0.0], [-7.652299881, 13.2541721886, 0.0], [0.0, 0.0, 17.5620002747]], constraint=FixAtoms(indices=[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55]), calculator=EANN(...))
Traceback (most recent call last):
  File "/home/fabio/Software/REANN/ASE/test/ase_eann.py", line 36, in <module>
    dyn = LBFGS(atoms,trajectory='atom2.traj')
  File "/home/fabio/miniconda3/envs/reann/lib/python3.9/site-packages/ase/optimize/lbfgs.py", line 65, in __init__
    Optimizer.__init__(self, atoms, restart, logfile, trajectory, master,
  File "/home/fabio/miniconda3/envs/reann/lib/python3.9/site-packages/ase/optimize/optimize.py", line 234, in __init__
    self.set_force_consistent()
  File "/home/fabio/miniconda3/envs/reann/lib/python3.9/site-packages/ase/optimize/optimize.py", line 325, in set_force_consistent
    self.atoms.get_potential_energy(force_consistent=True)
  File "/home/fabio/miniconda3/envs/reann/lib/python3.9/site-packages/ase/atoms.py", line 728, in get_potential_energy
    energy = self._calc.get_potential_energy(
  File "/home/fabio/miniconda3/envs/reann/lib/python3.9/site-packages/ase/calculators/calculator.py", line 711, in get_potential_energy
    energy = self.get_property('energy', atoms)
  File "/home/fabio/miniconda3/envs/reann/lib/python3.9/site-packages/ase/calculators/calculator.py", line 739, in get_property
    self.calculate(atoms, [name], system_changes)
  File "/home/fabio/miniconda3/envs/reann/lib/python3.9/site-packages/ase/calculators/eann.py", line 100, in calculate
    force=pes(period_table,cart,tcell,species)[1]
  File "/home/fabio/miniconda3/envs/reann/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/pes/PES.py", line 14, in forward
    cell: Tensor,
    species: Tensor) -> Optional[Tuple[Tensor, Tensor]]:
    _0 = (self.neigh_list).forward(period_table, cart, cell, )
          ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    neigh_list, shifts, = _0
    output = (self.nnmod).forward(cart, neigh_list, shifts, species, True, )
  File "code/__torch__/pes/get_neigh.py", line 22, in forward
    _2 = torch.slice(coordinates, 0, 0, 9223372036854775807, 1)
    _3 = torch.slice(_2, 1, 0, 9223372036854775807, 1)
    _4 = torch.copy_(_3, _1, False)
         ~~~~~~~~~~~ <--- HERE
    _5 = self.cutoff
    _6 = torch.mul(torch.reciprocal(torch.abs(cell)), _5)

Traceback of TorchScript, original code (most recent call last):
  File "/home/scms/jlchen20/software/2021_05_10/eann/pes/get_neigh.py", line 30, in forward
        deviation_coor=torch.round(inv_coor-inv_coor[0])
        inv_coor=inv_coor-deviation_coor
        coordinates[:,:]=torch.einsum("ij,jk -> ik",inv_coor,cell)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        #represent a,b,c project on x,y,z
        #num_repeats = torch.ceil(torch.stack([self.cutoff/torch.max(torch.abs(cell[:,i])) for i in range(3)])) 
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

It fails whne calling "LBFGS(atoms,trajectory='atom2.traj')"

Can you run the same exact script? I've been able to run train.py.

Thanks in advance,
Maxi

Batched input inference

Dear developers,

Once I have trained model I need to evaluate this model on millions of points. In order to efficiently calculate energies and forces I was thinking to use batched input and GPU. However, when I introduce batch dimension I get the following error
https://paste.ofcode.org/3bKCYPbUBhmmJYtfnLtjBhB

which means that inference routines do not accept batch dimension. I was wondering what would be the easiest way to overcome this. Maybe to use density and get_neigh from training routines?

Thanks in advance and best,
Ivan

get_neigh puzzle

Dear developers,

I have two questions regarding REANN PES during inference time. Pre-trained REANN model expects coordinates of atoms, neighlist, shifts, and species (as per https://github.com/zhangylch/REANN/blob/main/reann/ASE/calculators/reann.py). In order to find neighborhood list, and shifts, Fortran routine get_neigh.f90 is employed, which also outputs coordinates of atoms (if I understand correctly, it wraps all the atoms within one cell in case some atoms during MD go out of unit cell?)

  1. I have a pre-trained model which I loaded in a script similar to ASE/ase_reann.py (everything needed to replicate is in
    python.zip ). I calculated distances of first atom to all other atoms (print modifications in calculators/reann.py) for coordinates before and after get_neigh. If initial coordinates are all within unit cell (which they are for this specific example), I would not expect the distances to be different before and after get_neigh, but they seem to be different for some atoms. Could you please help me understand why this happens?

Output on my end for python:
https://paste.ofcode.org/3fR6fYRG3KstbmPqhTBPna

  1. I would like to use get_neigh routine in order to get coords, neighlist, shifts and number of neighbors in Fortran so I concatenated all three f90 files from ASE/fortran-neigh into one get_neigh.f90. I call these routines from main.f90 (everything needed to replicate is in
    fortran.zip ). Eventhough I use, what seems to me, identical input parameters for get_neigh subroutine I seem to be getting different result than when I use those same subroutines within ASE/calculators/reann.py. Namely, I get different number of neighbors, coor array, etc. I tried compiling with GCC/10.3.0 as well as newer version like GCC/12.3.0, as well as imkl/2021 and imkl/2023. Do you maybe have an idea why would I be getting different results? I imagine it's most likely due to difference in compilation so I was hoping if you could share some thoughts or advice how to properly compile or solve the problem.

Output on my end for fortran:
https://paste.ofcode.org/tV8hEfjKenarDiEPVe4w3G

Thanks in advance,
Ivan

Confused about the unit in `configuration` file

I'm trying to train a model of my system (salty water, including two kinds of ions and water molecules) with EANN or REANN. The dataset was prepared by CP2K (we have no license for VASP), which is in the unit of Angstrom (Cartesian coordinates), Hartree/Bohr (Cartesian atomic forces), and Hartree (energy). Here are my questions:

  1. What are the units used in configuration file?
  2. Is (R)EANN suitable for my system? Since I found that (R)EANN is popular in gas-solid scattering, but my system is for exploring gas-liquid scattering. Anyway, I could have a try and give you feedback.

Best wishes

训练example如何使用gpu

在训练example时,使用nvidia-smi看到gpu没有被使用。使用的命令是python3 -m torch.distributed.run --nproc_per_node=2 --nnodes=1 --standalone "/home/jinming/REANN/REANN/reann/",请问在哪里指定使用gpu进行训练?

EANN能否同时训练多个体系?

如何用REANN训练多个体系的数据?这些体系拥有不同数目和种类的原子。
我已经尝试过将两个体系的数据写在一个configuration中,或者分别将两个体系放在两个文件夹中,将路径写入input_nn,都没有成功。

训练模型时内存占用过大导致无法训练

你好,我想用reann训练一个比较大的数据集,但是在集群上训练,报了内存错误:RuntimeError: [enforce fail at alloc_cpu.cpp:75] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 300486697920 bytes. Error code 12 (Cannot allocate memory)
因为集群上只有256GB内存,我猜测是数据集太大了,就更换了一个不到500MB的数据集,可以成功训练了,但是占用了将近20GB的内存,这个内存占用是不是我参数设置的有问题?如果我现在想要训练那个大的数据集(17GB左右,大约900w个结构),有什么办法可以实现吗?

下面是我的参数设置,是我直接在example基础上修改的:

**input_nn**
# required parameters 
  start_table = 1                # start_table table for the fit with force(1) or without force(0)
                                 # start_table table for DM(2), TDM(3), polarizability(4)
  table_coor = 0                 # table_coor   0 for cartesian coordinates and 1 for direct coordinates
  nl = [64,64]          # neural network architecture   
 # nl = [512,512,512,512]          # neural network architecture   
  nblock =  1
  dropout_p=[0.0,0.0,0.0,0.0]
  table_init = 0             # 1 used for load parameters from pth 
  nkpoint=1                      # number of nkpoint NNs was employed to representation polarizability
# NN epoch and NN optimize parameters
  Epoch=20000                    # max iterations epoch                 
  patience_epoch = 200            # pre initial learning rate epoch   
  decay_factor = 0.5             # Factor by which the learning rate will be reduced. new_lr = lr * factor.
  start_lr = 0.001               # initial learning rate
  end_lr =1e-5                  # final learning rate
  re_ceff = 0                # factor for regularization
# wave epoch and wave optimize parameters
  ratio = 0.9                    # ratio for training
# =====================================================================
  batchsize_train = 128                # batch_size
  batchsize_test = 256                # batch_size
  e_ceff=0.1
  init_f=50                     # init_f
  final_f=0.5                     # final_f
#========================queue_size sequence for laod data into gpu
  queue_size=10
  print_epoch=5
  table_norm=True
  DDP_backend='nccl' 
  activate = 'Relu_like'
  dtype="float32"
#===========param for orbital coefficient ===============================================
     oc_nl = [64,64]          # neural network architecture   
     oc_nblock = 1
     oc_dropout_p=[0,0,0,0]
     oc_activate = 'Relu_like'
   #========================queue_size sequence for laod data into gpu
     oc_table_norm=True
     oc_loop=3
#========================floder used to save the data========================
  floder="/home/bwli/t1x_train/reann/"


**input_density**
#==============param for atomic energy=======================
  neigh_atoms=60
  cutoff=6e0
  nipsin=2
  atomtype=['H', 'C', 'N', 'O']
  nwave=7

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.