minaskar / zeus Goto Github PK

⚡️ zeus: Lightning Fast MCMC ⚡️

Home Page: https://zeus-mcmc.readthedocs.io/

License: GNU General Public License v3.0

Python 100.00%

mcmc mcmc-sampler bayesian-inference inference machine-learning sampling sampling-methods general-purpose black-box-bayesian-inference probabilistic-data-analysis

zeus's Introduction

zeus is a Python implementation of the Ensemble Slice Sampling method.

Fast & Robust Bayesian Inference,
Efficient Markov Chain Monte Carlo (MCMC),
Black-box inference, no hand-tuning,
Excellent performance in terms of autocorrelation time and convergence rate,
Scale to multiple CPUs without any extra effort,
Automated Convergence diagnostics.

Example

For instance, if you wanted to draw samples from a 10-dimensional Gaussian, you would do something like:

import zeus
import numpy as np

def log_prob(x, ivar):
    return - 0.5 * np.sum(ivar * x**2.0)

nsteps, nwalkers, ndim = 1000, 100, 10
ivar = 1.0 / np.random.rand(ndim)
start = np.random.randn(nwalkers,ndim)

sampler = zeus.EnsembleSampler(nwalkers, ndim, log_prob, args=[ivar])
sampler.run_mcmc(start, nsteps)
chain = sampler.get_chain(flat=True)

Documentation

Read the docs at zeus-mcmc.readthedocs.io

Installation

To install zeus using pip run:

pip install zeus-mcmc

To install zeus in a [Ana]Conda environment use:

conda install -c conda-forge zeus-mcmc

Attribution

Please cite the following papers if you found this code useful in your research:

@article{karamanis2021zeus,
  title={zeus: A Python implementation of Ensemble Slice Sampling for efficient Bayesian parameter inference},
  author={Karamanis, Minas and Beutler, Florian and Peacock, John A},
  journal={arXiv preprint arXiv:2105.03468},
  year={2021}
}

@article{karamanis2020ensemble,
    title = {Ensemble slice sampling: Parallel, black-box and gradient-free inference for correlated & multimodal distributions},
    author = {Karamanis, Minas and Beutler, Florian},
    journal = {arXiv preprint arXiv: 2002.06212},
    year = {2020}
}

Licence

zeus is free software made available under the GPL-3.0 License. For details see the LICENSE file.

zeus's People

Contributors

Stargazers

Watchers

zeus's Issues

Stepping procedure

Is there a particular reason why the stepping out is done in linear steps rather than doubling the step width (as in other implementations)? I am running into the limit (1e4) when testing on a 100d gaussian with std ranging from 1e-1 ... 1e-9, and increasing the limit did not solve this.

[ENH] support for named parameters

In emcee I added a PR for having named parameters (proposal, and the PR). This allows for dict-like access to the parameters rather than positional references. Example:

x = np.random.randn(100)  # 100 samples ~ N(0, 1)

def lnpdf(self, params: Dict[str, float]) -> np.float64:
    # A gaussian PDF with named parameters
    mean = params["mean"]
    var = params["var"]
    if var <= 0:
        return -np.inf
    return (
        -0.5 * ((mean - self.x) ** 2 / var + np.log(2 * np.pi * var)).sum()
    )

The main benefit is this allows one to build hierarchical models without having to remember "oh the first parameter for the N-th submodel is at position XYZ" (a common source of bugs).

Since zeus has the same API, I'd be happy to add it here too.

Enhancement Request: Ability to call chain manager by external function / module

I was able to get the test_mpi.py example to work, but I have not been able to get the chain manager to work when trying to use it inside a function or importing it. This functionality is desirable so that people can use custom run files or make other packages that use zeus as a dependency (which is what I am doing).

I am providing example files and how to reproduce the behaviour.

In the attached files 4.0MPITest.zip
This works:
mpiexec -n 5 python test_mpi.py

These below 3 do not work. They run, but they do not do have the correct behavior, they do not actually seem to initiate walkers and they do not export the expected ".npy" files.

mpiexec -n 5 python test_mpi_by_function1.py
mpiexec -n 5 python test_mpi_by_function2.py
mpiexec -n 5 python test_mpi_by_function3.py

The below 4th case runs for a bit , then 'stalls' or crashes with message that it can't pickle the object. This is when using zeus 2.0 (I don't think the below 4th case crashed before zeus 2.0, but I am not sure anymore).
mpiexec -n 5 python test_mpi_by_function4.py

Proposed solution: Unknown. But it would be nice if a clever way can be found to allow chain manager to work inside functions and when imported elsewhere. At present, it is rather puzzling to me that it doesn't work.

At the moment I am just going to use MPI externally. For example, if I have 8 processors, I will do 8 entirely separate EnsembleSampling and then combine the results. This is not ideal for a variety of reasons. I thought about generating a custom python file each time and running mpi from the command line, but that is not a suitable solution for me.

Add Conda support

And we have implemented it here:
https://threeml.readthedocs.io/en/latest/notebooks/Bayesian_tutorial.html#The-Zeus-sampler

We cannot have it yet as a default sampler because it is not on conda which we use to build all of our dependencies due to having many, but when it is on conda, we will pop that right in. Thanks for this!

Originally posted by @grburgess in #1 (comment)

get_last_sample should be function

This is a minor issue, as I can work around the difference using get_chain instead.

In emcee, get_last_sample is a function which returns the last sample. In zeus this is a property. For compatibility, it would be useful to match the emcee API. It also doesn't make sense to me to use a property called get_XXX.

PS Thanks for working on this software and algorithm - initial tests are looking good

specify random seed?

Unless I missed something, it doesn't seem like there's an option to supply a seed for the random number generator. If possible, I'd like to make my analyses reproducible. While I haven't used emcee, I see that there's State object that appears to be usable for this purpose.

Poor parallelisation scaling

I found that running zeus with n_MPI = n_walker/2 gives poor efficiency, with half of the CPU time being spend idling.

If I understand the code in EnsembleSampler.sample correctly, the stepping-out procedure it repeated until all walkers in the ensemble have reached their step-out position, and only then the shrinking procedure begins. This means that the ensemble has to wait until the last walker reaches the step-out position, during which all other walkers are idling. Please correct me if I misunderstood the implementation.

Since the the stepping-out and shrinking procedures are independent for each walker once the directions are set, it should be possible to restructure the loops such that walkers can start shrinking as soon as they finished stepping out, rather than having to wait for the last walker.

On a somewhat unrelated note, is there a reason the maxsteps are distributed randomly between left and right here: https://github.com/minaskar/zeus/blob/master/zeus/ensemble.py#L566 ?

Print parameter values if likelihood fails

I am working with a problem where some points in parameter space give invalid inputs for which my likelihood fails. When this happens, I would like zeus to print the parameter values at the point that caused the failure (as is done by emcee) to make it easier to debug/understand the physical reason for the likelihood failure.

Can you implement this please?

Resuming a chain more efficiently

Currently I think that if you resume a chain (by passing the result of get_last_sample to a new call to run_mcmc) then time is wasted recomputing the posteriors of the starting points, which were already computed last time.

Emcee lets you optionally pass in the posteriors of the starting point, if they are known. Would this be possible for zeus?

Computation of R-hat Statistic

First of all, thanks for the package and all your hard work!

I think I've encountered a couple of issues/bugs in the computation of the R-hat statistic.

First is just a typo I think. In lines 139-140 the chain means and variances are flattened by the list comprehension, where I think something like:

_means = np.vstack(means)
_vars = np.vstack(vars)

will keep the structure of the chains, so that np.var(means, ddof=1, axis=0) and np.mean(_vars, axis=0) will give the between-chain and within-chain variance, respectively, across all parameters (right now they end up as scalars, since the _means and _vars are flat).

This is similar in spirit to Issue #22 , but with a more significant effect here: on lines 120-121 where each split is reshaped to (-1, ndim), samples from all walkers are collapsed into each split, homogenizing them and leading to unrealistically low R-hat values. In my case I had ~28 walkers, many of which were stuck in well-separated modes and barely mixing at all, but nonetheless had a quite low split R-hat as recorded by the callback because each of the two splits had samples from all 28 walkers, making them statistically similar.

Since this is an ensemble method I had to spend some time convincing myself, but I really think R-hat should be computed across all (possibly split) walkers, rather than by grouping them together. I can share my trace plots and make a case for this in more detail if it's helpful. With change (1) above fixing this would just be a matter of removing the reshape operation. Then nsplits would determine how many splits are made within each walker.

Thanks again, and let me know what your thoughts are. I'm happy to help implement these changes, too.

"mk" autocorrelation calculation

Hi @minaskar,

I looked at the autocorrelation stuff you've just added, and it looks really useful. Thanks!

zeus/zeus/autocorr.py

Lines 54 to 65 in 658ad94

    
           if method == 'mk': 
        
               # Minas Karamanis method 
        
               f = _autocorr_func_1d(y.reshape((-1), order='C')) 
        
           elif method == 'dfm': 
        
               # Daniel Forman-Mackey method 
        
               f = np.zeros(y.shape[1]) 
        
               for yy in y: 
        
                   f += _autocorr_func_1d(yy) 
        
               f /= len(y) 
        
           else: 
        
               # Goodman-Weary method 
        
               f = _autocorr_func_1d(np.mean(y, axis=0))

I had a question/comment about the "mk" calculation. It looks like you're wanting to calculate the autocorrelation over each of the chains, which makes more sense to me than averaging over the chains (dfm) or averaging before calculating autocorrelation (Goodman-Weare). But with the mk method, by simply reshaping the 2d y array into a 1d array, the autocorrelation will be artificially lower than it should be, on account of also look at the correlation across the boundary points between chains. I.e. if the chains each have length n, then the autocorrelation of y2 = y.reshape((-1)) will be artifically lower at each position in y2 that is an integral multiple of n.

Wouldn't it be more appropriate to instead take ffts of each chain separately and average over the chains in frequency domain?

Walkers Initialization in Basic Use Examples: Shouldn't these use Uniform Distribution?

In the basic use example ( https://zeus-mcmc.readthedocs.io/en/latest/ ) currently there is this line:

start = np.random.randn(nwalkers, ndim)
https://zeus-mcmc.readthedocs.io/en/latest/index.html
Also here:
https://zeus-mcmc.readthedocs.io/en/latest/notebooks/datafit.html

I copied that for how zeus is initialized in my code. But isn’t it wrong? If you use standard distribution you are biasing the starting points. Even burn in probably should not completely remove the effects of bias.

I think that probably we probably want something like this:

start = 4*(np.random.rand(nwalkers, ndim)-0.5)

That way we get initialization from a bounded uniform distribution from -2 standard deviations to +2 standard deviations.

I have tried both ways in my code ( link ) and they give similar results.
I have switched to the uniform way in in my code. Like below.

walkerStartsFirstTerm = 4*(np.random.rand(nwalkers, numParameters)-0.5) 
walkerStartPoints = walkerStartsFirstTerm*std_prior + mean_prior

Edit: the FAQ suggests starting the walkers in a ball near the MAP, so maybe this is okay. https://zeus-mcmc.readthedocs.io/en/latest/faq.html Also, that may mean for some problems it is better to start with a posterior maximizing routine (such as Metropolis Hastings MCMC or Nelder-Mead) followed by Ensemble Slice Sampling.

pip version does not have EnsembleSampler

I used a pip install. Then tried to follow the example here:
https://zeus-mcmc.readthedocs.io/en/latest/

It didn't work. But changing EnsembleSampler to sampler worked.

zeus has no attribute 'EnsembleSampler'

I am trying to reproduce manual working example. But it gives me the following error:
AttributeError: module 'zeus' has no attribute 'EnsembleSampler'

But it works with emcee though!

[Feature Request] Add a `CITATION.cff`

Github recently released a new feature where repository owners can add a CITATION.cff file making it easy for others to cite the repository. Adding a CITATION.cff would make the attribution process very easy for others (myself included 😅 ) who want to cite this work.

Bug: Autocorrelation fails with newest Scipy

In autocorr.py, scipy is imported as:
'import scipy as sp'
The function _autocorr_func_1d uses scipy for the fft function. However, in newer versions of scipy, you must call the fft function in the following way or else an AttributeError will occur:
from scipy import fft
fft.fft() # how to use the fft function

Please change this soon. Thank you.

Update sklearn requirement to scikit-learn

The requirement sklearn is depreciated, and causes issue when installing zeus:

  DEPRECATION: sklearn is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
  Running setup.py install for sklearn: started
  Running setup.py install for sklearn: finished with status 'error'
  error: subprocess-exited-with-error
  
  × Running setup.py install for sklearn did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
      rather than 'sklearn' for pip commands.
      
      Here is how to fix this error in the main use cases:
      - use 'pip install scikit-learn' rather than 'pip install sklearn'
      - replace 'sklearn' by 'scikit-learn' in your pip requirements files
        (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
      - if the 'sklearn' package is used by one of your dependencies,
        it would be great if you take some time to track which package uses
        'sklearn' instead of 'scikit-learn' and report it to their issue tracker
      - as a last resort, set the environment variable
        SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error
      
      More information is available at
      https://github.com/scikit-learn/sklearn-pypi-package
      
      If the previous advice does not cover your use case, feel free to report it at
      https://github.com/scikit-learn/sklearn-pypi-package/issues/new
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> sklearn

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
Error: Process completed with exit code 1.

I can fix this myself for what I'm doing, but advise you change the requirement to scitkit-learn so other users dont have issuess.

JAX implementation of zeus

Greetings!

I've ported a subset of zeus functionality to the NumPyro project under the sampler name ESS.

(For the uninitiated, NumPyro uses JAX, a library with an interface to numpy and additional features like JIT compiling and GPU support, in the backend. The upshot is that if you're using currently using zeus, switching to NumPyro may give you a dramatic inference speedup!)

I've tried my best to match the existing API. You can use either the NumPyro model specification language

import jax
import jax.numpy as jnp

import numpyro
from numpyro.infer import MCMC, ESS
import numpyro.distributions as dist

n_dim, num_chains = 5, 100
mu, sigma = jnp.zeros(n_dim), jnp.ones(n_dim)

def model(mu, sigma):
    with numpyro.plate('n_dim', n_dim):
        numpyro.sample("x", dist.Normal(mu, sigma))

kernel = ESS(model, moves={ESS.DifferentialMove() : 1})

mcmc = MCMC(kernel, 
            num_warmup=1000,
            num_samples=2000, 
            num_chains=num_chains, 
            chain_method='vectorized')

mcmc.run(jax.random.PRNGKey(0), mu, sigma)
mcmc.print_summary()

or provide your own potential function.

def potential_fn(z):
    return 0.5 * jnp.sum(((z - mu) / sigma) ** 2)

kernel = ESS(potential_fn=potential_fn,
             moves={ESS.DifferentialMove() : 1})

mcmc = MCMC(kernel, 
            num_warmup=1000,
            num_samples=2000, 
            num_chains=num_chains, 
            chain_method='vectorized')

init_params = jax.random.normal(jax.random.PRNGKey(0), 
                                (num_chains, n_dim))

mcmc.run(jax.random.PRNGKey(1), mu, sigma, init_params=init_params)
mcmc.print_summary()

Hope this is helpful to some folks!

zeus and emcee

@minaskar Dear Minas, I just test zeus with my MCMC problem. It runs very well. But I have a question: it seems that zeus is very similar with another MCMC programmer 'emcee', including its function format, some basic usage... So what's the zeus's advantage over emcee?

Originally posted by @yuanzunli in #7 (comment)

Can I resume jobs with the same Zeus sampler?

I am looking to make it so that I can cancel a Python script mid-run and then resume the job. With Emcee, I do this via the backend feature.

I want to have access to all the previous chains and samples so that I can use previously computed autocorrelation times, and whatnot. So simply passing the previous chains to a new instance of the sampler doesn't quite cut it.

Does Zeus have a specific feature that support resuming jobs? If not, no problem, I can achieve the desired effect via pickling the EnsembleSampler (I hope!). couldn't find anything in the docs, but figured I'd ask before writing code to do this.

zeus fails to estimate parameters on simple example where emcee has no trouble

I'm trying a basic example, estimating mean and variance of a bivariate normal distribution given independent samples.
For this example, the prior on the mean and variance is that they are positive.

Emcee has no trouble finding the parameters.
Zeus, however, fails with the following error:

RuntimeError: Number of expansions exceeded maximum limit!
Make sure that the pdf is well-defined.
Otherwise increase the maximum limit (maxiter=10^4 by default).

I tried specifying light_mode true or other moves without success.

Thanks,
Alexis

test7.zip

Poor/no mixing of walkers

I'm repeatedly getting almost no mixing of the walkers, see the attached figure. In other examples, I saw that walkers didn't move at all (which seems to contradict the statement of unit-acceptance). In all these runs, mu gets arbitrarily close to 0, before the patience setting kicks in. The posterior distribution is unimodal and has some mildly bent degeneracies but nothing too crazy. Do you have any advice?

`zeus` has a hard time with a simple transit fit problem (as compared to `emcee`)

Hi @minaskar,

First of all: thanks a ton for this package! We were excited to read about it in your paper, and it looks so promising that our team has started an implementation to add it into our juliet package (nespinoza/juliet#59).

Typically, the first tests we do when implementing a new sampler into juliet is to try it on a simple exoplanet transit fit (i.e., fitting the relative flux of an exoplanet --- we typically use TESS lightcurves for this). In particular, this is an 8-dimensional problem (so, "easy"). Our initial tests with juliet typically "failed" with zeus, meaning, the posteriors looked wonky.

At first I thought maybe we made some mistake implementing zeus, so we decided to re-code a minimal working example for you to look at; you can find that here: https://github.com/nespinoza/zeus-transit-example/blob/master/zeus_tests.ipynb. There, I compare zeus against two other samplers:

emcee, using the exact same log-probability function and starting point for the walkers.
MultiNest (via our juliet package), a nested sampling scheme (i.e., sampling directly from the prior, so no need to define starting points) for which this problem is a very easy one.

Interestingly, zeus typically fails 1/3 of the way with an error that reads:

RuntimeError: Number of contractions exceeded maximum limit!
Make sure that the pdf is well-defined.
Otherwise increase the maximum limit (maxiter=10^4 by default).

If I try enlarging the maxiter, it just gets stuck again in general. If you run it a handful of times, though, one might get lucky and you would be able to sample from the posterior (as it is done in the notebook --- but I encourage you to re-run the notebook so you can see the error I mentioned above). Here's the comparison of zeus versus emcee samples:

As in our juliet tests (not shown here, but we can share them with you if you want), the zeus posteriors look a bit weird. Some walkers apparently go crazy, and try to do their own sampling away from the rest. As a comparison, here's the posterior with MultiNest:

As you see, emcee and MultiNest agree very well with the posterior. But, again, don't know why zeus has such a hard time with this particular problem. So, two questions:

Why is zeus getting stuck in this particular problem?
Why when it doesn't get stuck, it produces such weird-looking posteriors?

Thanks in advance for any pointers!
Néstor

logprob values

Hello,

I cannot tell from the documentation if the log prob or log likelihood values are stored. Can you point me to this?

Constrained sampling in parameter space

Hi, I'm trying to fit a model using Zeus, however I don't have an analytic model, which forces me to interpolate my model data. I run into issues with the sampler when the walkers explore the parameter space too far from where I expect the MAP to be. For instance, I define one parameter to have a uniform prior in (0.8, 1.2) but when the walkers explore ~1.5 my model interpolation breaks (above interpolation bounds). I have tried filling the absent values (outside interpolation range) with nans or infs but those seem to be problematic. Extrapolation also seems to be problematic too since I get

RuntimeError: Number of expansions exceeded maximum limit! 
Make sure that the pdf is well-defined. 
Otherwise increase the maximum limit (maxiter=10^4 by default).

But this could be other bug and still I doubt extrapolating is a good approach.
In short, do you have some way to constrain walkers to a certain domain in the parameter space?

Thanks in advance,

Deprecation Warning for Collections

...\anaconda3\envs\zeus\Scripts\zeus.py:2: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Iterable as IterableType

It seems this should become "from collections.abc import Iterable as IterableType"

However, I could not find zeus.py in the repository, so I could not make a pull request to fix this.

I imagine this is likely to become in an error in the next year or so, if not fixed.

	if method == 'mk':
	# Minas Karamanis method
	f = _autocorr_func_1d(y.reshape((-1), order='C'))
	elif method == 'dfm':
	# Daniel Forman-Mackey method
	f = np.zeros(y.shape[1])
	for yy in y:
	f += _autocorr_func_1d(yy)
	f /= len(y)
	else:
	# Goodman-Weary method
	f = _autocorr_func_1d(np.mean(y, axis=0))