The bayes-skopt's discuss from kiudee

Implement plot for optimization trajectory

Convert TopTwoEI into FullGPAcquisition

Currently, TopTwoEI only implements the "TopTwo" part of the algorithm. In the actual algorithm, one decides randomly between EI and TopTwoEI.

Investigate the zeus library

The zeus library is a recent black-box MCMC library, which looks like it can be a drop-in replacement for emcee.
If it results in either (1) faster performance or (2) faster/more robust convergence, it could be worthwhile.

Migrate to Github Actions & Nox

The repository is currently using travis-ci.com, which is no longer free for open source projects.
That is why the repository should migrate to Github Actions. The opportunity can be used to move from tox to nox.

Implement input warping to better model nonstationary functions

Reference: https://arxiv.org/pdf/1402.0929.pdf

Implement an asynchronous version of tell

The computation of tell can become quite slow when the number of observations grow. Since we optimize slow black-box functions, it would be useful to have a tell_async method which would not block.

alpha is not updated when BayesGPR.sample(...) is called

The method sample is behaving like the fit function, if data is provided. That is why all attributes need to be altered as well.

Add Steinerberger low-discrepancy sequence

Is your feature request related to a problem? Please describe.
The R2 sequence we currently use for initialization works well for low dimensions but has visible patterns in higher dimensions.
Steinerberger (2019) propose a simple energy functional which can be used to add points to an existing design which minimize discrepancy:

Describe alternatives you've considered
I investigated different methods for generating blue noise (e.g. using optimal transport). But they are hard to implement and require expensive computation.

Additional context
Here is a sample of the Steinerberger sequence (1 initial random point)

Here is a uniform sample of 100 points in [0, 1]⁵

Our current R2 sequence outputs

Investigate the compatibility with emcee 3.1.0

Emcee 3.1.0 was recently released and an optimization using bask raised

File "/mnt/tuning-server/venv/lib/python3.8/site-packages/emcee/backends/backend.py", line 175, in grow
a = np.empty((i, self.nwalkers, self.ndim), dtype=self.dtype)
TypeError: 'numpy.float64' object cannot be interpreted as an integer

The library bayes-skopt currently supports all 3.x.x versions:

bayes-skopt/pyproject.toml

Line 30 in 7f3e7af

emcee = "^3.0.2"

but should 3.1.0 break compatibility, steps should be taken to fix that.

Only do maximum likelihood fitting of the GP once to initialize

Currently, the optimizer calls the fit function every time the tell function is called (and the init phase is over). This is unnecessary, since MCMC sampling is used to update the parameters of the GP.

bayes-skopt/bask/optimizer.py

Lines 93 to 100 in 071d504

    
           self.gp.fit( 
        
               self.space.transform(self.Xi), 
        
               self.yi, 
        
               priors=self.gp_priors, 
        
               n_desired_samples=gp_samples, 
        
               n_burnin=gp_burnin, 
        
               progress=progress, 
        
           )

Unify the parameters of the tell and run method

Is your feature request related to a problem? Please describe.
The Optimizer.run method does not offer all of the parameters the tell method offers.

Describe the solution you'd like
The parameters and their default values should be consistent.

Recompute proposal when acquisition function is changed

It is possible for users to change the acquisition function manually during an optimization run.

Example:

from bask import Optimizer
from bask.acquisition import PVRS
opt = Optimizer(...)
opt.acq_func = PVRS()

This however only takes effect after the next point has been evaluated. It would be useful if we implement a setter which does the recomputation.

Improve scalability by implementing stochastic Lanczos expansions

The log determinant which needs to be computed for the marginal log likelihood, slows down the inference already with moderate numbers of instances.
This approach might help:
https://arxiv.org/abs/1711.03481

Support proposal of several points in parallel

Rationale

The library currently supports only sequential evaluation of 1 point. When doing parallel hyperparameter optimization on a cluster, it would be beneficial to be able to propose several points.

Depending on the acquisition function used, it is clear/unclear on how to propose several points. For acquisition functions like expected improvement (EI) there exist variants which propose several points at once (Ginsbourger et al. 2007), but are difficult to compute. The "kriging believer" and "constant liar" heuristics are strategies, which can be applied to all acquisition functions, but also require sequential computation of a number of points.
Thompson sampling can be trivially parallelized and is a good first candidate.

Tasks

Set up the interfaces and the library to support multiple points (including caching and on-demand computation).
Implement parallel computation for Thompson sampling.
Implement constant liar/kriging believer.

Implement support for Thompson sampling

The acquisition functions are currently called using mu and std already evaluated for a set of points:

bayes-skopt/bask/acquisition.py

Lines 24 to 28 in 071d504

    
           mu, std = gpr.predict(X, return_std=True) 
        
           for j, acq in enumerate(acquisition_functions): 
        
               tmp_out = acq(mu, std, **kwargs) 
        
               if np.all(np.isfinite(tmp_out)): 
        
                   acq_output[j] += tmp_out

For Thompson sampling we instead want to sample a GP instead of the mean process.

Allow tell method to completely override data

Rationale

In some settings it is viable to parallelize the evaluation of the target function and results might arrive delayed. In order to be able to update the model with the new information it would be useful if the tell method supports replacing the data, instead of appending.

Implement probability of optimality

Is your feature request related to a problem? Please describe.
Currently, it is hard to gauge, how good the optimum is which has been found so far.
This makes it hard to decide when to terminate an optimization run.

Describe the solution you'd like
A method Optimizer.probability_of_optimality(epsilon) should be implemented.
It will output the probability that the current optimum is optimal with a tolerance of ε.

Implement predictive variance reduction search

We currently employ maximum-value entropy search as the main acquisition function.
Nguyen et al (2017) argue that concentrating on collecting information about the value y* is not enough to find the optimum position x*.

They instead propose the following algorithm

where

Downside: Without having an efficient implementation for Thompson Sampling, this criterion is very costly to evaluate

Support ensembles of acquisition functions

The library only supports passing one acquisition function right now.
To support ensembles (e. g. GP hedge) or schedules (e. g. first explore using Thompson sampling, then switch to MES/EI to exploit), it should be possible to pass arbitrary callback classes.

Update dependencies

Currently we pin scikit-learn to 0.22, because the Gaussian process implementation of scikit-learn 0.23 introduced normalize_y with division by the standard deviation. That will cause problems, when all datapoints produced the same output.

See:
scikit-optimize/scikit-optimize#947
scikit-learn/scikit-learn#18371
scikit-learn/scikit-learn#18318
scikit-learn/scikit-learn#18388

To do

Update scikit-learn to >=0.22,<0.24
Update scikit-optimize to ^0.8
Evaluate if tests need to be recalculated (likely) and update them

Update emcee reference in requirements

Check if number of priors matches number of parameters

Support triggering a recomputation of the next point (without tell)

Support for GPFlow as a backend

Currently, the library uses scikit-optimize/sklearn as a backend to do Gaussian process computations. These implementations are easy to use and served the library well so far.
A big problem is that the library is quite limited in functionality. One big use case of bayes-skopt is to handle very noisy target functions. In that regard it would be useful to be able to model heteroscedastic noise, which the Gaussian process in sklearn does not really support. It is possible to set the alpha parameter to a vector, which will incorporate the noise of the training data, but during prediction it is still noiseless.
This could be useful for acquisition functions which properly handle the observation noise, like noisy EI and knowledge gradient.

GPFlow offers a lot of the needed functionality out of the box. It is straightforward to construct heteroscedastic likelihoods. It also supports stochastic variational Gaussian processes, which allow GPs to scale to more than 10k observations.
Therefore, migrating to GPflow as a backend would be a good long-term goal. Sadly, it will require a major rewrite of the library, since many classes (Optimizer, acquisition functions etc.) are tightly coupled to the current GP implementation.

Roadmap

Get familiar with all relevant aspects of GPFlow.
Identify an interface for the GP component which would allow plugging in different backends.
Migrate the existing code to obey the new interface specification, reducing the coupling between the components.
Add the GPFlow GP implementation.
Add GPFlow specific features.

Fix n_initial_points when using replace=True

Support joint prior distributions

Currently, it is only possible to pass marginal prior distributions to the library, since it iterates over the list of priors:

bayes-skopt/bask/bayesgpr.py

Lines 208 to 209 in a38808d

    
           for prior, val in zip(priors, x): 
        
               lp += prior(val)

A common use case is to save the posterior distribution of the hyper parameters (e.g. using a mixture of Gaussians) and use it to jumpstart subsequent optimization runs.

Update scikit-learn dependency

This library is increasingly becoming difficult to install in python environments, due to the narrow range of scikit-learn versions it supports.
In addition, you could say that it is not that useful as a library to optimize ML algorithms, if it does not support current versions of scikit-learn.

Expose random seed for MaxValueSearch

bayes-skopt/bask/acquisition.py

Lines 215 to 257 in 1f77d51

    
           class MaxValueSearch(UncertaintyAcquisition): 
        
               """Select points based on their mutual information with the optimum value. 
        
               Parameters 
        
               ---------- 
        
               n_min_samples : int, default=1000 
        
                   Number of samples for the optimum distribution 
        
               References 
        
               ---------- 
        
               [1] Wang, Z. & Jegelka, S.. (2017). Max-value Entropy Search for Efficient 
        
                   Bayesian Optimization. Proceedings of the 34th International Conference 
        
                   on Machine Learning, in PMLR 70:3627-3635 
        
               """ 
        
               def __call__(self, mu, std, *args, n_min_samples=1000, **kwargs): 
        
                   def probf(x): 
        
                       return np.exp(np.sum(st.norm.logcdf((x - mean) / std), axis=0)) 
        
                   # Negative sign, since the original algorithm is defined in terms of the maximum 
        
                   mean = -mu 
        
                   left = np.min(mean - 3 * std) 
        
                   right = np.max(mean + 5 * std) 
        
                   # Binary search for 3 percentiles 
        
                   q1, med, q2 = [ 
        
                       brentq(lambda x: probf(x) - val, left, right,) for val in [0.25, 0.5, 0.75] 
        
                   ] 
        
                   beta = (q1 - q2) / (np.log(np.log(4.0 / 3.0)) - np.log(np.log(4.0))) 
        
                   alpha = med + beta * np.log(np.log(2.0)) 
        
                   max_values = ( 
        
                       -np.log(-np.log(np.random.rand(n_min_samples).astype(np.float32))) * beta 
        
                       + alpha 
        
                   ) 
        
                   gamma = (max_values[None, :] - mean[:, None]) / std[:, None] 
        
                   return ( 
        
                       np.sum( 
        
                           gamma * st.norm().pdf(gamma) / (2.0 * st.norm().cdf(gamma)) 
        
                           - st.norm().logcdf(gamma), 
        
                           axis=1, 
        
                       ) 
        
                       / n_min_samples 
        
                   )

Improve documentation

To make the library more accessible, all the publicly facing methods should be properly documented. In addition example Jupyter notebooks could be beneficial to illustrate how the library is to be used.
Differences to the parent library scikit-optimize need to be clear.

To do

Support setting constant parameters

It is a common use case that one would like to optimize a certain subspace of the parameter space using the knowledge gained so far for the full space.

Fix error in MES criterion

MES in very rare cases causes an error here:

r = _zeros._bisect(f, a, b, xtol, rtol, maxiter, args, full_output, disp)

Activate normalize_y by default

The BayesGPR is a general purpose Gaussian process, but in this library is geared heavily towards hyper parameter optimization. Since normalize_y = False can cause some weird behavior when optimizing, we should set it to True by default. This is more in line with what users will expect.

Convert into proper pypi/conda package

Stabilize best_mean output in BayesSearchCV

We are currently using skopt.utils.expected_minimum to compute the best mean point of the Gaussian process. In some cases this function fails, because the bfgs optimizer exceeds the allowed ranges.

Augment hyperposterior sampling by also sampling observed values

In the current implementation, we sample the hyperparameters of the Gaussian process and average across those samples. The training data is fixed. In BoTorch, the hyperparameters are fixed and the observations y of the data points are sampled, and the acquisition function is averaged over those.

I think both ideas can be combined, allowing the user to request:

Holding both hyperparameters and observations fixed: This will yield the classical acquisition functions for the noiseless case.
Sampling only the observations y. This will yield the noisy acquisition function versions, assuming that the hyperparameters are accurate.
Sampling only the hyperparameters. This will yield the current behavior, where we are robust to model misspecification.
Sampling both the hyperparameters and the observations. Combining the strengths of both approaches.

The steps required are:

Implement a fast method for cloning the GP model using the same, or different hyperparameters.
Add the different methods for averaging to

bayes-skopt/bask/acquisition.py

Line 48 in 8f1daf9

def evaluate_acquisitions(

Version 0.10.5 not available through pip

To Reproduce
Steps to reproduce the behavior:

pip install --upgrade bask==0.10.5
ERROR: Could not find a version that satisfies the requirement bask==0.10.5
ERROR: No matching distribution found for bask==0.10.5

Is there perhaps a connection to there being no tag named v0.10.5 in the GitHub repo?

Desktop (please complete the following information):

OS: Linux

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Allow optimization based acquisition functions

Computation of acquisition functions only on sampled points is problematic in high-dimensional spaces, where the distance to the true optimum (of the acquisition function) will be large on average.

Support callbacks for the optimizer

Often times the parameters of the optimizer need to be changed during the optimization process. To this end it would be useful to have support for schedulers, which can set the parameters in relation to the iteration.

Another common application is to plot the current landscape in regular intervals.

Avoid unnecessary computation of cholesky decompositions

Every time the hyperparameters of the BayesGPR are changed, the cholesky factor L is recomputed. Usually this is desirable, to ensure that the model stays up-to-date. During optimization usually we sample hyperparameter configurations and do not need the average.

	self.gp.fit(
	self.space.transform(self.Xi),
	self.yi,
	priors=self.gp_priors,
	n_desired_samples=gp_samples,
	n_burnin=gp_burnin,
	progress=progress,
	)

	mu, std = gpr.predict(X, return_std=True)
	for j, acq in enumerate(acquisition_functions):
	tmp_out = acq(mu, std, **kwargs)
	if np.all(np.isfinite(tmp_out)):
	acq_output[j] += tmp_out

	class MaxValueSearch(UncertaintyAcquisition):
	"""Select points based on their mutual information with the optimum value.

	Parameters
	----------
	n_min_samples : int, default=1000
	Number of samples for the optimum distribution

	References
	----------
	[1] Wang, Z. & Jegelka, S.. (2017). Max-value Entropy Search for Efficient
	Bayesian Optimization. Proceedings of the 34th International Conference
	on Machine Learning, in PMLR 70:3627-3635
	"""

	def __call__(self, mu, std, args, n_min_samples=1000, *kwargs):
	def probf(x):
	return np.exp(np.sum(st.norm.logcdf((x - mean) / std), axis=0))

	# Negative sign, since the original algorithm is defined in terms of the maximum
	mean = -mu
	left = np.min(mean - 3 * std)
	right = np.max(mean + 5 * std)
	# Binary search for 3 percentiles
	q1, med, q2 = [
	brentq(lambda x: probf(x) - val, left, right,) for val in [0.25, 0.5, 0.75]
	]
	beta = (q1 - q2) / (np.log(np.log(4.0 / 3.0)) - np.log(np.log(4.0)))
	alpha = med + beta * np.log(np.log(2.0))
	max_values = (
	-np.log(-np.log(np.random.rand(n_min_samples).astype(np.float32))) * beta
	+ alpha
	)

	gamma = (max_values[None, :] - mean[:, None]) / std[:, None]
	return (
	np.sum(
	gamma * st.norm().pdf(gamma) / (2.0 * st.norm().cdf(gamma))
	- st.norm().logcdf(gamma),
	axis=1,
	)
	/ n_min_samples
	)

kiudee / bayes-skopt Goto Github PK

bayes-skopt's Issues

Rationale

Tasks

Rationale

To do

Roadmap

To do

Recommend Projects

Recommend Topics

Recommend Org