Introduction to libEnsemble

libEnsemble is a Python toolkit for coordinating workflows of asynchronous and dynamic ensembles of calculations.

libEnsemble helps users take advantage of massively parallel resources to solve design, decision, and inference problems and expands the class of problems that can benefit from increased parallelism.

libEnsemble aims for:

Extreme scaling: Run on or across laptops, clusters, and leadership-class machines.
Dynamic Ensembles: Generate new tasks on-the-fly based on previous computations.
Dynamic Resource Management: Reassign resource partitions of any size for tasks.
Monitoring/killing of applications: Ensemble members can poll or kill running apps.
Resilience/fault tolerance: libEnsemble can restart incomplete tasks or entire ensembles.
Portability and flexibility: Run identical libEnsemble scripts on different machines.
Exploitation of persistent data/control flow: libEnsemble can pass data between ensemble members.
Low start-up cost: Default single-machine deployments don't require additional services.

libEnsemble's users select or supply generator and simulator Python functions; these respectively produce candidate parameters and perform/monitor computations that use those parameters. Generator functions can train models, perform optimizations, and test candidate solutions in a batch or streaming fashion based on simulation results. Simulator functions can themselves use parallel resources and involve libraries or executables that are not written in Python.

With a basic familiarity of Python and NumPy, users can easily incorporate any other mathematics, machine-learning, or resource-management libraries into libEnsemble workflows.

libEnsemble employs a manager/worker scheme that communicates via MPI, multiprocessing, or TCP. Workers control and monitor any level of work using the aforementioned generator and simulator functions, from small subnode tasks to huge many-node computations.

libEnsemble includes an Executor interface so application-launching functions are portable, resilient, and flexible; it also automatically detects available nodes and cores, and can dynamically assign resources to workers.

libEnsemble performs best on Unix-like systems like Linux and macOS. See the FAQ<faqwindows> for more information.

Dependencies

Required dependencies:

Python 3.7 or above
NumPy
psutil
setuptools

When using mpi4py for libEnsemble communications:

A functional MPI 1.x/2.x/3.x implementation, such as MPICH, built with shared/dynamic libraries
mpi4py v2.0.0 or above

Optional dependencies:

Balsam

As of v0.9.0, libEnsemble features an updated Balsam Executor for workers to schedule and launch applications to anywhere with a running Balsam site, including to remote machines.

pyyaml

libEnsemble is typically configured and parameterized via Python dictionaries. libEnsemble can also be parameterized via yaml.

funcX

As of v0.9.0, libEnsemble features a cross-system capability powered by funcX, a function-as-a-service platform to which workers can submit remote generator or simulator function instances. This feature can help distribute an ensemble across systems and heterogeneous resources.

psi-j-python

As of v0.9.3, libEnsemble features a set of command-line utilities for submitting libEnsemble jobs to almost any system or scheduler via a PSI/J Python interface. tqdm is also required.

The example simulation and generation functions and tests require the following:

SciPy
mpmath
petsc4py
DEAP
DFO-LS
Tasmanian
NLopt
PETSc/TAO - Can optionally be installed by pip along with petsc4py
Surmise

PETSc and NLopt must be built with shared libraries enabled and be present in sys.path (e.g., via setting the PYTHONPATH environment variable). NLopt should produce a file nlopt.py if Python is found on the system. See the NLopt documentation for information about building NLopt with shared libraries. NLopt may also require SWIG to be installed on certain systems.

Installation

libEnsemble can be installed or accessed from a variety of sources.

Install libEnsemble and its dependencies from PyPI using pip:

pip install libensemble

Install libEnsemble with Conda from the conda-forge channel:

conda config --add channels conda-forge
conda install -c conda-forge libensemble

Install libEnsemble using the Spack distribution:

spack install py-libensemble

libEnsemble is included in the xSDK Extreme-scale Scientific Software Development Kit from xSDK version 0.5.0 onward. Install the xSDK and load the environment with:

spack install xsdk
spack load -r xsdk

The codebase, tests and examples can be accessed in the GitHub repository. If necessary, you may install all optional dependencies (listed above) at once with:

pip install libensemble[extras]

A tarball of the most recent release is also available.

Testing

The provided test suite includes both unit and regression tests and is run regularly on:

GitHub Actions

The test suite requires the mock, pytest, pytest-cov, and pytest-timeout packages to be installed and can be run from the libensemble/tests directory of the source distribution by running:

./run-tests.sh

Further options are available. To see a complete list of options, run:

./run-tests.sh -h

The regression tests also work as good example libEnsemble scripts and can be run directly in libensemble/tests/regression_tests. For example:

cd libensemble/tests/regression_tests
python test_uniform_sampling.py --comms local --nworkers 3

The libensemble/tests/scaling_tests directory includes example scripts that use the executor to run compiled applications. These are tested regularly on HPC systems.

If you have the libEnsemble source code, you can download (but not install) the testing prerequisites and run the tests with:

python setup.py test

in the top-level directory containing the setup script.

Coverage reports are produced separately for unit tests and regression tests under the relevant directories. For parallel tests, the union of all processors is taken. Furthermore, a combined coverage report is created at the top level, which can be viewed at libensemble/tests/cov_merge/index.html after run_tests.sh is completed. The coverage results are available online at Coveralls.

Basic Usage

The default manager/worker communications mode is MPI. The user script is launched as:

mpiexec -np N python myscript.py

where N is the number of processors. This will launch one manager and N-1 workers.

If running in local mode, which uses Python's multiprocessing module, the local comms option and the number of workers must be specified, either in libE_specs or via the command-line using the parse_args() function. The script can then be run as a regular Python script:

python myscript.py --comms local --nworkers N

This will launch one manager and N workers.

See the user guide for more information.

Resources

Support:

Email questions or request libEnsemble Slack page access from [email protected].
Open issues on GitHub.
Join the libEnsemble mailing list for updates about new releases.

Further Information:

Documentation is provided by ReadtheDocs.
Contributions to libEnsemble are welcome.
An overview of libEnsemble's structure and capabilities is given in this manuscript and poster.
Examples of production user functions and complete workflows can be viewed, downloaded, and contributed to in the libEnsemble Community Examples repository.

Citation:

Please use the following to cite libEnsemble:

@techreport{libEnsemble,
  title   = {{libEnsemble} Users Manual},
  author  = {Stephen Hudson and Jeffrey Larson and Stefan M. Wild and
             David Bindel and John-Luke Navarro},
  institution = {Argonne National Laboratory},
  number  = {Revision 0.9.3},
  year    = {2022},
  url     = {https://buildmedia.readthedocs.org/media/pdf/libensemble/latest/libensemble.pdf}
}

@article{Hudson2022,
  title   = {{libEnsemble}: A Library to Coordinate the Concurrent
             Evaluation of Dynamic Ensembles of Calculations},
  author  = {Stephen Hudson and Jeffrey Larson and John-Luke Navarro and Stefan Wild},
  journal = {{IEEE} Transactions on Parallel and Distributed Systems},
  volume  = {33},
  number  = {4},
  pages   = {977--988},
  year    = {2022},
  doi     = {10.1109/tpds.2021.3082815}
}

Example Compatible Packages

libEnsemble and the Community Examples repository include example generator functions for the following libraries:

APOSMM Asynchronously parallel optimization solver for finding multiple minima. Supported local optimization routines include:
- DFO-LS Derivative-free solver for (bound constrained) nonlinear least-squares minimization
- NLopt Library for nonlinear optimization, providing a common interface for various methods
- scipy.optimize Open-source solvers for nonlinear problems, linear programming, constrained and nonlinear least-squares, root finding, and curve fitting.
- PETSc/TAO Routines for the scalable (parallel) solution of scientific applications
DEAP Distributed evolutionary algorithms
Distributed optimization methods for minimizing sums of convex functions. Methods include:
- Primal-dual sliding (https://arxiv.org/pdf/2101.00143).
- Distributed gradient descent with gradient tracking (https://arxiv.org/abs/1908.11444).
- Proximal sliding (https://arxiv.org/abs/1406.0919).
ECNoise Estimating Computational Noise in Numerical Simulations
Surmise Modular Bayesian calibration/inference framework
Tasmanian Toolkit for Adaptive Stochastic Modeling and Non-Intrusive ApproximatioN
VTMOP Fortran package for large-scale multiobjective multidisciplinary design optimization

libEnsemble has also been used to coordinate many computationally expensive simulations. Select examples include:

OPAL Object Oriented Parallel Accelerator Library. (See this IPAC manuscript.)
WarpX Advanced electromagnetic particle-in-cell code. (See example WarpX + libE scripts.)

See a complete list of example user scripts.

prav2508 / libensemble Goto Github PK

libensemble's Introduction

Introduction to libEnsemble

Dependencies

Installation

Testing

Basic Usage

Resources

libensemble's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent