joshspeagle / dynesty Goto Github PK

View Code? Open in Web Editor NEW

338.0 19.0 75.0 144.23 MB

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences

Home Page: https://dynesty.readthedocs.io/

License: MIT License

Python 100.00%

nested-sampling pure-python mit-license bayesian-inference monte-carlo model-selection

dynesty's People

Contributors

Stargazers

Watchers

Forkers

abonaca shanydanieli jrleja bacook17 jtmendel guillochon gregoryashton mazaixing timothydmorton johannesbuchner volpatto knut0815 astrojuan abeelen pcubillos dstndstn colmtalbot pacargile wym109 papovich mattpitkin psheehan rory-smith fagan2888 anteakd minaskar caseylam ejhigson jammy2211 bsipocz g-bassetto avivajpeyi smutch ahnitz matthieudartiailh stevertaylor alanjtu doublestrong zeta1999 mikekatz04 ruskin23 ewoudwempe renlliang3 geraintpratten segasai vishalbelsare mjkirk douglasalvesastro12 rongrong00 cscheffler ishan-mishra geoffryan aidan-van-duzer almarranz akashgpt eteq akramerhh adacs-australia kentaromogushi vandalt mcurliss inkcloser battyone edbennett lejaychen seheon-oh raolixiang-up tomasstolker aqhali ajshajib zikai1 mperezjigato

dynesty's Issues

Add simple Rejection sampler

Although no one should ever use this, I should add in a simple rejection sampler that always samples uniformly from the unit N-cube. This will be useful for shrinkage tests and baseline performance tests.

documentation

Now's the time to actually write stuff up! Testing should proceed simultaneously as the final features get added.

Simulated Weights

While the current implementation of dynesty computes the first and second moments of the volume weights logwt, the distribution of the full set of weights is (in theory) exactly known.

For the set of dead points, each dead point's change in volume t_i at iteration i is i.i.d. Beta(K,1) where K is the current set of live points. This leads to shrinkage of t_1 * ... * t_i.
For the final set of live points, the marginal distribution of the point with the jth worst likelihood (i.e. removed after j iterations once we've terminated our run after collecting N dead points) is distributed as Beta(K-j+1, j). The entire set of points can be jointly distributed as Y_1+...+Y_{K-j+1} / Y_1 + ... + Y_{K+1} where the K+1 Y's are i.i.d. Expo (standard exponential).

Add in a function that can quickly simulate these statistical uncertainties by drawing N t_i ~ Beta(K,1) and K+1 Y_i ~ Expo random variables.

simulated/resampling errors

Need to fix those utilities to deal with results from DynamicSampler. Also, maybe I should implement a "approximate" vs "exact" mode to simulate from marginals rather than joint (depending on timing tests).

initialize with saved results

Users might want to "continue" a previous run after they've dumped the set of previous results to disk. Either:

add in functionality allowing a previous Results dictionary to be used to initialize a sampler, or
add an example showing how to do this manually using pre-existing features (initializing live points manually and then combining the runs using merge_runs).

Only import Matplotlib if necessary

When using dynesty on large parallel cpu clusters, matplotlib is many times not needed and sometimes not available resulting in a ImportError. Therefore, it would be better/more efficient if dynesty only imported matplotlib if the user wanted to use the visualization submodule.

import six change

According to @guillochon, six is better for Python 2/3 compatibility.

parallelize code

Although the code is written in a parallelizable-ish form, I haven't done any actual tests. Those need to happen!

additional sampling args and tracking performance with blobs

Two issues here. Currently the target efficiency for rwalk and rtraj are hard-coded. I should make this an argument that can be passed by the user in case they want to play around with these.

In addition, the blob arguments I send back to the chosen NestedSampler are only used to update the scale and are not saved anywhere. Users might want access to these if they want to tune parameters and examine additional arguments besides just scalefactors. I should have both the scalefactors and blobs returned to the user. This should be implemented using a return_tuning=True flag, which returns scalefactors (scale) and tuning parameters (blob).

uncertainties with variable numbers of live points

This is a multi-part issue. Essentially, the typical way we compute uncertainties (using the KL divergence, aka "information gain" from prior to posterior) only works if the live points are fixed. I believe this generalizes in a straightforward manner to deal with uncertainties with variable prior volume compression (logarithmic vs uniform, and for varying numbers of points). Implement this fix in both the NestedSampler and DynamicSampler.

Add radfriends/supfriends

Buchner (2014) outlined a simple algorithm to propose new points that passed shrinkage tests using the Euclidean (RADFRIENDS) and sup (SUPFRIENDS) norm. I'd like to implement these within the collection of local sampler classes.

plotting kwargs

Currently some defaults are accidentally hard-coded. Go through and fix this so that users can plot things with more flexibility.

dynamic sampling

While I'm sure there are bugs that I'll need to rigorously test out, I think the general version has enough base functionality to finally think about adding in dynamic sampling. There are two basic approaches that I can take.

The simple approach

Lay down a base run with enough live points to define efficient proposal distributions. Those now are fixed. Add individual strands using those proposals until some stopping criterion is reached. This is straightforward but not efficient.

The integrated approach

You don't want to spend a lot of time shrinking the prior volume slowly if you have a specific goal in mind. This means you should lay down runs strand by strand from the beginning.
As we add more strands, we will have the capacity to update our bounding distributions. There should be some threshold used to update this adaptively. My guess would be: uniform until nlive=2*ndim and then one of the other approaches (default 'multi') afterwards set by the user. There also need to be some criterion to trigger proposal updates using the live points at each iteration. The user should be able to determine this schedule (both proposal changes and updates).
Each live/dead point will have to be tied to a specific proposal distribution, especially if we're keeping (pseudo-)importance sampling in mind. This will require some reorganizing.

Both approaches will require the ability to regenerate past runs, which will require some additional functionality. This might require some/a lot of internal changes to how I currently have organized the Sampler objects but not a lot of base method modifications.

proper sampling in "pre-whitened" spaces

In the PolyChord paper, they discuss slice sampling in spaces that have been transformed such that the appropriate step size is ~1. I had interpreted this as saying we should propose steps and directions based on, e.g., the set of bounding ellipsoids. However, I now believe that this may be incorrect: the correct way is to always propose orthogonal steps on the native unit cube, but scale the initial step size based on the proposal distribution in that direction. This is because linear transformations make it more likely that we propose in certain directions vs others, which we fail to correct for. This doesn't matter as much for random walks since we always satisfy detailed balance with any static proposal (and the "right" way to do proposals is Quite Hard), but does matter for slices since we've now distorted our sampling space.

This shouldn't be a huge change I think: we just need to map the orthogonal unit cube basis onto the ellipsoid to get proper scale factors.

Tracking Live Points and Simulating Errors

Skilling (2006) demonstrated that different nested sampling runs can be trivially added together. This property of nested sampling has its roots (I think) in the properties of Poisson processes. One of the key insights presented in Higson et al. (2017) is that this property also works in reverse, and so it is straightforward to deconstruct a nested sampling run with K live points into a series of K independent runs with 1 live point (a "strand"). This enables you to probe some of the underlying sampling errors (which are separate from the simulated weights) within a single run using, e.g., bootstrap techniques.

Add in a way to keep track of each live point (i.e. to preserve each point's unique label) throughout the course of the run.
Implement a split function that takes a run with K live points and splits it into K strands.
Implement a merge function that takes a set of N runs with K_1, ..., K_N live points and merges them into 1 run with K_1 + ... + K_N live points.
Implement a resample function which splits a run with K live points into K strands, samples K strands with replacement, and recombines them into one run.
Implement an simulate_error function which resamples the run and then simulates a corresponding set of weights.

Test Suite

In my initial set of dynesty commits when changing over from nestle, I removed the entire test suite (along with other things) to clear up the repository. However, I'd like to add them back in both as an executable and/or as an interactive notebook. This should also include shrinkage tests.

output loglike

Add this to the default output print function regardless of whether we're sampling in a batch or not.

Quadratic Integration

The current version of dynesty implements a first-order integration scheme to estimate the volume weights logwt using the current expected volume logvol both for the nested set of "dead" points and the final remaining set of "live" points. We want to switch to a second order scheme to both reduce integration error and improve clarity.

Bootstrap volume enlarge factors

Buchner (2014) outlined a nice "shrinkage test" to see whether nested sampling algorithms are shrinking the prior volume at the expected (proper) rate (i.e. by Beta(K,1) for K live points). He found the ellipsoidal decomposition algorithm employed by MultiNEST (similar to the one used in nestle and dynesty) shrinks too quickly, and suggested a fix using bootstrapping to organically derive the ellipsoid enlarge factors to mitigate against this. A similar scheme was implemented in the construction of his RADFRIENDS algorithm.

Refactor samplers to incorporate separate bounding/sampling methods

As I've gone along and started adding new sampling methods, I've noticed that no sampler can really function completely independently of some method for bounding the distribution -- they all require some tunable proposal distributions, especially to deal with multiple modes. As such, it makes more and more sense to allow for a sampler to bound with one method and propose with another. A list of these methods (so far) are below.

Bounding:

none (global)
single ellipsoid (global)
multiple ellipsoids (global)
radfriends (per point?)
supfriends (per point?)

Sampling:

uniform (global)
random walk (from point)
repeated slicing (from point)
random trajectory (from point)

I should change how the samplers work to allow users to specify both options via some bound and sampling keywords. This also solves a several other problems: you want to propose at different modes based on their volume, you want your proposal tuned to where you are (to some extent) based on the distribution of live points, etc.

suppress sqrt warnings

When computing the estimated logzerr oftentimes the estimates can be ill-defined, leading to warnings being raised during runtime. These interrupt the print statements and are generally annoying because they occur reasonably often, so by default these should probably be suppressed.

volume compression with variable number of live points

After some work, I realized that the simple 1/K_i compression scheme outlined in Higson et al. (2017) actually is incorrect. When there is a deceasing number of live points, we actually are sampling from the set of order statistics associated with the previous (local) maximum number of live points. This leads to the following behavior:

For K_{i+1} >= K_i, the distribution is Beta(K_{i+1}, 1) and the typical geometric mean arguments hold just fine.
But for K_{i+j} < ... < K_i < K_{i-1}, we instead are sampling from Beta(K_i - j, j + 1), which compresses uniformly (instead of exponentially).
This means that over a dynamic run, we dynamically switch from compressing exponentially to compressing uniformly as we add in more live points.

This is fine, but slightly changes the weights that we will assign our samples. I need to change the default behavior in DynamicSampler to accommodate this.

KDE weights in plots

Plotting the raw weights has so far confused a lot of initial users. I think the better strategy is to natively implement KDE to generate the smoothed weight distribution in runplot but give users an option to disable it if they want to see the actual weights. My guess is this should be implemented in all plots that actually show the weights explicitly.

setup.py

I should add one.

Dynamic Nested Sampler: No Bounds?

When running dynamic nested sampler, it failed (presumably after completing its inner loop), with the following message:
run: 0 | iter: 1513 | nc: 1 | ncall: 51328 | eff(%): 2.950 | logz: -7727.260 +/- 0.227 | dlogz: 13.417 > 0.500 run: 0 | iter: 1514 | nc: 1 | ncall: 51329 | eff(%): 2.952 | logz: -7716.199 +/- 0.660 | dlogz: 2.083 > 0.500 run: 0 | iter: 1515 | nc: 1 | ncall: 51330 | eff(%): 2.953 | logz: -7714.875 +/- nan | dlogz: 0.660 > 0.500 Traceback (most recent call last): File "pcmd_integrate.py", line 55, in <module> results = fit_model.nested_integrate(**args) File "/n/home01/bcook/pixcmd/pcmdpy/fit_model.py", line 202, in nested_integrate sampler.run_nested(ninit=N_points, maxcall_init=max_call, dlogz_init=dlogz) File "/n/home01/bcook/dynesty/dynesty/dynamicsampler.py", line 1121, in run_nested logl_bounds = wt_function(res, wt_kwargs) File "/n/home01/bcook/dynesty/dynesty/dynamicsampler.py", line 82, in weight_function bounds = (min(bounds) - lpad, min(max(bounds) + lpad, nsamps - 1)) ValueError: min() arg is an empty sequence

It seems like, for some reason, the default weight function is not finding any "bounds", in dynamicsampler.py line 81:

bounds = np.arange(nsamps)[weight > maxfrac * max(weight)]

return value not initialized

First of all, thanks for this awesome tool!

However, there is a small bug in version 0.8.0:

dynesty/dynesty/bounding.py

Line 1145 in cd655fd

return am_new

The return value of the function make_eigvals_positive is not initialized of the if statement fails.

Add slice sampler

In high dimensions, we cannot rely on standard rejection sampling to properly probe the distribution -- no matter how clever we make our proposals (ellipsoids, MCMC, etc.), sooner or later we will inevitably lose to the ~ e^-d falloff as the volume of the ellipsoids blows up. To solve this, we can use slice sampling, which scales as ~ d^3 instead and has been used in nested sampling via, e.g., PolyChord.

To start, take a look at @pacargile's slice sampling nestle fork based on @bd-j's squish code, and combine this with the basic algorithm outlined in the PolyChord paper.

Notes:

Possibly consider elliptical slice sampling?

Add RandomTrajectory Sampler

This is typically referred to "Galilean Monte Carlo (GMC)" (see, e.g., this). The basic idea is to initialize a particle with some velocity v and launch it in a random direction. It then proceeds unimpeded until it reflects off the likelihood boundary set by L_j > L_i, computed using the normal vector at that location. If the reflected particle is itself still outside the boundary, we reverse our trajectory. This procedure is analogous to Hamiltonian Monte Carlo (HMC) methods, except that our potential function is a uniform well with infinitely steep boundaries rather than a smooth log-posterior.

While the name makes sense (I believe it's a derivative of "Hamiltonian"), I find the whole MCMC analogues (and renaming) to be more confusing that enlightening at this point, especially concerning the underlying dynamics. For instance, the "MCMC"-like procedure employed to choose new live points is really a bounded random walk, so why not just call it a random walk? Likewise, Galilean MC is just a bounded random trajectory bouncing off surfaces, so why not just call it a random trajectory?

Notes:

I also think "GMC" is subject to the same optimization opportunities as HMC via, e.g., NUTS, in order to adaptively set step sizes and integration lengths while still satisfying detailed balance.
Calculating the normal vector at the bounding surface is a pain, and generally requires the gradient of the log-likelihood or some approximation to it. Computing this properly actually matters, since an improper reflection would result in non-uniform sampling. Maybe see this link?
This might be equivalent to reflective slice sampling.

ellipsoid self modification

It might be cleaner to put the logic in bounding_ellipsoids for generating the ellipsods into the ellipsoids themselves (as a method). So you could supply points and pointvol to this method and the ellipsoid would update itself (instead of having to instantiate a new Ellipsoid each time)

adaptive sampling

It might be useful to allow the sampling technique to vary during a fit as a function of efficiency (or in emcee parlance, 'acceptance fraction') and number of dimensions. For example, if the efficiency of a uniform sampler decreases to 10% or less in a high dimensional space, it would save time to switch to a slice sampler.

add logo

Several people have mentioned trying to add some type of emcee-style logo. The best recommendation has been a crown in a bird's nest (or something similar).

save scale

At each iteration, we also want to save the relevant scale factor. This is important to initialize good guess for the dynamic sampler when using non-uniform sampling methods such as slice, since bad scale factors near the posterior can lead to extremely correlated samples that can corrupt our results.

xrange -> range

xrange only exists for python2. However, the concurrent.futures package with ThreadPoolExecutor or ProcessPoolExecutor only exists in python3

Live points with Dynamic Sampler

The user defined "live_points" input is not being sent to the DynamicSampler method within the DynamicNestedSampler function.

broken link in README

The URL for the documentation in the README.md should be http://dynesty.readthedocs.io/ not dynesty.readthedocs.io. Currently it interprets the latter as a local file.

importance sampling

While dynamic sampling might eliminate the need for this, I want to make sure I keep things structured so this is still possible. The idea is that once we have a results object (or more generally a list of samples and associated proposals) that we'll be able to run this over the results to get additional "bonus" weights as an add-on. This should be fine as long as I keep track of all the rejected proposals during each iteration.

Dynamic Nested Sampler: Negative logzvar

When running a test using the dynamic nested sampler, logzvar came out negative after a couple thousand iterations:

Visualizing Proposal Distributions

Currently dynesty is unable to regenerate the proposal distributions used over the course of the run for any sampler. Add in functionality that saves the proposal distribution directly (e.g., a single Ellipsoid) or indirectly (via the initial state used to construct the proposal) so they can be re-generated later. This will be crucial for visualizing runs. It will also be useful if we want to start moving to utilizing (pseudo-)importance sampling weights.

Implement Iterator

@bd-j and @pacargile would like to turn the main nested sampling loop into some type of Iterator similar to how emcee works.

incorporate monte carlo errors into plots

Currently, the only errors included are in runplot and use the "first-order" approximation. However, simulate_run, resample_run, and sample_run all can numerically simulate (entirely or portions of) the error. This feature should be incorporated into at least runplot.

Passing kwargs and args to likelihood function

Could you add args= and kwargs= when initializing the NestedSampler object that would take an argument list and/or a keyword dictionary that get passed to the likelihood function? This would allow for likelihood functions of the type lnlike(v,*args,**kwargs), and would allow for backwards compatibility with EMCEE.

Plotting utilities

Many sampling codes (PyMC3, etc.) have some simple plotting utilities to visualize their final results and/or their overall runs. I'd like to add a few of these to dynesty so users can make quick and dirty diagnostics without too much hassle.

To do:

Evidence summation plots.
Posterior mass plots as, e.g., Higson et al. (2017).
Simple corner plots (likely using corner).
1-D Trace plots and marginalized posteriors.
2-D and 3-D projections of the set of live/dead points at a given point during the run (possibly with proposal distributions).
A way to turn a run into a movie.

delay on initial bounding

Bounding the original set of points sampled from the unit cube is not helpful for two reasons. First, the shape of the points follows that of the cube, leading to weird decompositions that do not conform to the shape of the general likelihood surface. Second, and more importantly, the volume encompassed by the bound significantly exceeds the original cube by construction, increasing exponentially as a function of dimension. To fix this, I should try and institute some type of "bounding delay" on the first set of points where we continue to sample from the prior until the overall efficiency drops below some threshold. If possible, this should be implemented in both the standard nested sampler and the dynamic sampler.

Split up code

As it stands, the entirety of the dynesty is stored in one giant file whose overall structure is inherited from nestle. It would be great to break this up into a collection of files to allow for better organization of classes, etc.

negative information error

While trying the prospector mock demo with Dynesty I get
RuntimeError: Negative h encountered (h=-1.5538444519). Please report this as a likely bug.
when using the NestedSampler in multi/unif mode with bootstrap=0

see
https://github.com/bd-j/prospector/blob/dydemo/demo/Interactive_Nested_Demo.ipynb

logvol fix in dynamic sampler

Currently, the dynamic sampler works by "hacking" an internal nestedsampler object. One of the consequences of this is that the log-(prior) volume quantities tracked internally (and used to determine the minimum pointvol) continue to shrink even though the actual log-volume probed at each batch is different. Since the internal results are garbage anyways and by default the dynamicsampler object doesn't save any samples within the internal nestedsampler object, I should implement a fix that replaces the internal sampler.saved_logvol and sampler.dlv values each time it updates with a new batch to ensure the compute pointvol remains robust .

convergence criteria

I need to test whether using the KL divergence between a "baseline" run and a "sampled" (resampled + reweighted) run can be used to determine convergence to the posterior distribution.

Add RandomWalk Sampler

While I removed the mcmc class and 'classic' sampling method from nestle in the current implementation of dynesty, I'd like to add it back in soon.

utils bugs with dynamic sampler results

Currently a number of utilities do not work properly with dynamic nested sampling runs. These include resample_run (and by extension sample_run), merge_runs, and unravel_run.

There are two issues. First, to properly merge runs with variable numbers of live points, I need to know the lower logl_min bound that was used to initially add each point in. I can then use the same procedure as the one implemented in the dynamic nested sampler to merge strands in "one-at-a-time" (or in batches).

Second, I need a way to resample the strands that properly preserves the information content of a dynamic sampling run (i.e. the "many paths" interpretation that works so well for standard runs). This requires splitting the run into two groups of points:

Points sampled starting from the prior (with a defined mapping to prior volumes). These form the "backbone" of the run -- they determine where the bulk of the posterior mass is located, necessary for evidence estimation.
Points sampled starting interior to the prior (without a defined mapping to prior volumes). These form "additions" to the run that improve inference but are not useful on their own.

Resampling then needs to be done within each group of points before combining the runs to preserve their respective information content. Afterwards, points can be combined using the same procedure as used for merging.

To get these to work, I have to include additional information within Results from dynamic runs such as the batch ID associated with a group of points and the logl bounds associated with each batch.

To accompany this bugfix, I should also make sure to test the veracity of our results using repeated nested sampling runs.

To do:

Add in additional information for dynamic sampling runs that can be used to properly reconstruct results.
Modify unravel_run to store relevant batch information when unraveling a dynamic run (allowing for re-assembly).
Modify merge_runs to properly deal with dynamic sampling runs.
Modify resample_run to deal with dynamic sampling runs, as described above.
Confirm results are robust.

parallelize volume bootstraps

Currently these are done sequentially, but there's absolutely no reason why these also can't be done in parallel using the same scheme as the live point proposals. Once we get the parallelization scheme finalized I can come back to this.