Code Monkey home page Code Monkey logo

suncat-center / catlearn Goto Github PK

View Code? Open in Web Editor NEW
98.0 98.0 60.0 92.03 MB

A machine learning environment for atomic-scale modeling in surface science and catalysis.

Home Page: http://catlearn.readthedocs.io/

License: GNU General Public License v3.0

Python 99.80% Shell 0.02% Dockerfile 0.18%
atomistic-machine-learning catalysis catalyst computational-chemistry machine-learning materials-informatics materials-science nanotechnology python

catlearn's People

Contributors

dependabot[bot] avatar doylead avatar graph-theory-natcatal avatar ikowalec avatar jagarridotorres avatar jianglst avatar mamunm avatar mhangaard avatar mhoffman avatar pcjennings avatar raulf2012 avatar schlexer avatar vieri2006 avatar vladislavivanistsev avatar ziyun-wang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

catlearn's Issues

Update Gradients Tutorials

The gradients tutorials need updating to jupyter notebook format with some additional discussion of what is going on/expected.

PLOTNEB problem with plot's text positional requirement

I am currently testing the tutorials, in particular the tutorials/11_NEB/04_CO_Cu111/nebCO.py
Everything ran properly except the PLOTNEB module.
I encountered the following output:

The ML-NEB algorithm required  19.636363636363637 times less number of function evaluations than the standard NEB algorithm.
Energy barrier: 0.05631016623911789 eV
Traceback (most recent call last):
  File "/home/krojas/student/rai/catlearn_test/test03/nebCO.py", line 137, in <module>
    plotneb(trajectory='ML-NEB.traj', view_path=False)
  File "/home/krojas/APPS/mambaforge/envs/catlearn/lib/python3.10/site-packages/catlearn/optimize/tools.py", line 50, in plotneb
    ax.annotate(s=str(np.round(e_barrier, 3))+' eV',
TypeError: Axes.annotate() missing 1 required positional argument: 'text'

Method to replicate:

  1. Create conda environent conda create -n pycatlearn python=3.10 catlearn
  2. Download and run tutorials/11_NEB/04_CO_Cu111/nebCO.py

May I ask how to fix this?
Should I specify the matplotlib version ?

Force prediction in the Gaussian process model

Hi all,

I have noticed that CatLearn includes both energy and forces in the training of a Gaussian process model, but only predicts energy from the GP model. The predicted forces are computed using finite differences according to Phys. Rev. Lett. 122, 156001 (2019) . However, predicting forces directly from the GP model is also starightforward once forces are included in the training just as what J. Chem. Phys. 147, 152720 (2017) did. Why not predicting forces directly from the GP model in CatLearn? Is there any benifit of using the finite difference approach?

Best,
Zeyuan

CI Docs

It would be good if the CI could generate the sphinx docs on-the-fly so we didn't have to keep updating things. It would probably be as simple as calling:

sphinx-apidoc -o docs catlearn

within so additional post complete part of the CI.

MLNeb hangs with latest ASE from git

The tutorial described in 11_NEB/00_Tutorial/Tutorial_MLNEB.ipynb hangs when using the latest ASE git head due to changes in the Dynamics class located in ase/optimize/optimize.py. Specifically, this loop can repeat forever:

while ml_converged is False:
# Save prev. positions:
prev_save_positions = []
for i in self.images:
prev_save_positions.append(i.get_positions())
neb_opt.run(fmax=(fmax * 0.85), steps=1)
get_results_predicted_path(self)
unc_ml = np.max(self.uncertainty_path[1:-1])
e_ml = np.max(self.e_path[1:-1])
if e_ml >= self.max_target + 0.2:
for i in range(0, self.n_images):
self.images[i].positions = prev_save_positions[i]
if self.fullout is True:
print('Pred. energy above max. energy. '
'Early stop.')
ml_converged = True
if unc_ml >= max_step:
for i in range(0, self.n_images):
self.images[i].positions = prev_save_positions[i]
if self.fullout is True:
print('Maximum uncertainty reach. Early stop.')
ml_converged = True
if neb_opt.converged():
ml_converged = True
n_steps_performed = neb_opt.__dict__['nsteps']
if np.isnan(ml_neb.emax):
sp = str(-self.n_images) + ':'
self.images = read('./all_predicted_paths.traj', sp)
for i in self.images:
i.get_potential_energy()
n_steps_performed = 10000
if n_steps_performed > ml_steps-1:
if self.fullout is True:
print('Not converged yet...')
ml_converged = True

This happens when neb_opt doesn't immediately converge, because neb_opt.run(..., steps=1) now returns before performing any steps if neb_opt.max_steps has been reached. Additionally, neb_opt.nsteps isn't incremented by neb_opt.run(...) beyond max_steps, so the bailout condition of n_steps_performed > ml_steps-1 never evaluates to True.

A simple workaround would be to manually set neb_opt.nsteps = 0 immediately before neb_opt.run(..). There's probably a more elegant way of telling ASE's NEB class to iterate once, but that would require more changes to the code, and the workaround I describe seems to work for me.

As an aside, I don't understand why CatLearn accesses nsteps through neb_opt's __dict__ attribute (L407). Is there a reason for this atypical access pattern?

Convergence issue with newer ASE

Workaround: ML-NEB is still stable and compatible with ASE 3.17.0.

The two latest ASE stable releases, however, breaks ML-NEB, causing each iteration to slow down dramatically and possibly prevents convergence.

Help is wanted in identifying the bug.

sklearn deprecated Imputer breaking `clean_data.py` module

The most recent version of sklearn (0.22) has removed the Imputer class from within from preprocessing.imputation location and as a result the following traceback is given:

~/TEMP/CatLearn/catlearn/preprocess/clean_data.py in <module>
      2 import numpy as np
      3 from collections import defaultdict
----> 4 from sklearn.preprocessing import Imputer
      5 from scipy.stats import skew
      6 

ImportError: cannot import name 'Imputer'

The following message is located in the old preprocessing.imputation file

@deprecated("Imputer was deprecated in version 0.20 and will be "                      
    "removed in 0.22. Import impute.SimpleImputer from "                       
    "sklearn instead.")                

It looks like simply replacing Inputer with SimpleImputer would be sufficient, but we should make sure that these classes are in fact the same before fixing

Evaluate diagonal only on predict std

Predicting mean and covariance on a test set, X_test, scale N**2, because we are constructing K(X_test, X_test). In case X_test is large, we would prefer to not calculate the covariance matrix, but just the standard deviation based on the diagonal (See gpfunctions.uncertainty)

VASP internal relaxation for CatLearn NEB

Hi all,
I am just wondering if it possible to run CatLearn ML-NEB without using the ASE VASP calculator but using the VASP internal relaxation (I.e. optimizing the initial and final end-points in VASP)

Update Docstrings

  • Add docstrings to all functions.
  • Make sure everything has Returns.
  • Add attributes to docstring.

Question: Correct parallelization usage

Hi I would like to ask about how to properly run the catlearn code with proper parallelization.

I tested catlearn and compared it with traditional neb.x code of quantum espresso.
With the same system and number of images, the catlearn (single node - 64 core) and the neb.x (5 nodes - 64 core each - image parallelized) have the same duration. This means that catlearn is more efficient in resources.

I would like to further expand this by utilizing more node for the DFT calculation, say 5 nodes for 1 DFT evaluation.
When I do the catlearn (5 node - 64 core each), the calculation become rather slow.
The 5 node method is applied to the DFT calculation via ASE_ESPRESSO_COMMAND.
I think the bottleneck maybe due to the parallelization of the catlearn for 5 nodes ? (at least the automatic treatment is not correct).

May I ask how to properly do this ?

Error in docs on featurizing

Just a small error, but the docs say that one should use:

from catlearn.fingerprint.setup import FeatureGenerator

whereas the FeatureGenerator function now seems to be in catlearn.featurize.setup

Parallel Testing

There are some issues with parallelism in Python2.7 with the adsorbate fingerprinting and maybe others. This is specific to 2.7 and does not affect Python3+.

In general I think it would be best for tests to be run with nprocs=None to pick up these errors in any code with parallelism.

TravisCI has a server for parallel testing that could be used I think. But specific tests would need to be written for this. Otherwise I think we only have access to a single core by default, so even when nprocs=None everything is still being run in serial.

Highly pedantic `requirements.txt`

Right now the requirements.txt looks as follows:

ase==3.16.0
click==6.7
cycler==0.10.0
decorator==4.3.0
flask==1.0.2
h5py==2.7.1
itsdangerous==0.24
jinja2==2.10
kiwisolver==1.0.1
markupsafe==1.0
matplotlib==2.2.2
networkx==2.1.0
numpy==1.14.3
pandas==0.23.0 
pyparsing==2.2.0
python-dateutil==2.7.3
pytz==2018.4
scikit-learn==0.19.1
scipy==1.1.0
six==1.11.0
tqdm==4.23.3
werkzeug==0.14.1

This means, in order to install CatLearn with pip install catkit my system needs to match all the packages down to the patch level or pip will refuse to install it. Point in case, if I upgrade my numpy today I would get version 1.14.4 but pip refuses to install CatLearn in this situation since it thinks that CatLearn requires exactly 1.14.3. This leaves me with two options: either I downgrade or all my other packages to matches exactly CatLearn (and potentially break other packages) or I have to escape into a virtualenv or docker to spin exactly those versions. Would it be possible to relax some of these version numbers using >= or ~=. >= simply fixes the version of greater or equal to the stated number but allows to skip trailing digits. So, numpy>=1.4 would allow everyting greater or equal than 1.4.0. ~= skips one more rank. To quote the essential part of the following website

Mopidy-Dirble ~= 1.1        # Compatible release. Same as >= 1.1, == 1.*

This website documents the different possible qualifiers https://pip.pypa.io/en/stable/reference/pip_install/#example-requirements-file . Better yet unless there is a specific reason I would never state the third version number because assuming the depedency sticks to semantic version this would only count the number of patches not break backwards compatibility.

Tests generate 200000 PendingDeprecationWarnings, which causes failure.

This is due to use of np.mat or np.matrix in ASE. ASE has fixed this in the master branch, but the warnings will crash our tests until the next ASE release.

All warnings were supposed to be filtered to "once" or "ignore", but unfortunately either unittests or pytest overrides this.

MLMIN bug with ASE 3.19

There is a major bug in MLMIN when used with ASE 3.19. It behaves like previous NEB bug (if im not mistaken)
The bug is: the iteration of geo_opt using GPR in line 283 doesn't works well

I have compared them when used with ASE 3.17 and 3.19
the one that optimized quickly is done with ASE 3.17
Screenshot from 2020-07-05 13-54-44
Screenshot from 2020-07-05 12-51-58

Requirements for PyPi.

The setup.py file currently has the requirements defined.

 25     install_requires=['ase==3.16.0',
 26                       'h5py==2.7.1',
 27                       'networkx==2.1.0',
 28                       'numpy==1.14.2',
 29                       'pandas==0.22.0',
 30                       'pytest-cov==2.5.1',
 31                       'scikit-learn==0.19.1',
 32                       'scipy==1.0.1',
 33                       'tqdm==4.20.0',
 34                       ],

This is a bad idea as they need to be kept updated along with the requirements.txt file. At some point it is highly likely these will diverge.

There needs to be a way to automatically parse the requirements.txt when setup.py is run that is compatible with the uploaded PyPi package.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.