kiudee / bayes-skopt Goto Github PK
View Code? Open in Web Editor NEWA fully Bayesian implementation of sequential model-based optimization
Home Page: https://bayes-skopt.readthedocs.io/
License: Other
A fully Bayesian implementation of sequential model-based optimization
Home Page: https://bayes-skopt.readthedocs.io/
License: Other
Currently, TopTwoEI only implements the "TopTwo" part of the algorithm. In the actual algorithm, one decides randomly between EI and TopTwoEI.
The zeus library is a recent black-box MCMC library, which looks like it can be a drop-in replacement for emcee.
If it results in either (1) faster performance or (2) faster/more robust convergence, it could be worthwhile.
The repository is currently using travis-ci.com, which is no longer free for open source projects.
That is why the repository should migrate to Github Actions. The opportunity can be used to move from tox to nox.
Reference: https://arxiv.org/pdf/1402.0929.pdf
The computation of tell
can become quite slow when the number of observations grow. Since we optimize slow black-box functions, it would be useful to have a tell_async
method which would not block.
The method sample
is behaving like the fit
function, if data is provided. That is why all attributes need to be altered as well.
Is your feature request related to a problem? Please describe.
The R2 sequence we currently use for initialization works well for low dimensions but has visible patterns in higher dimensions.
Steinerberger (2019) propose a simple energy functional which can be used to add points to an existing design which minimize discrepancy:
Describe alternatives you've considered
I investigated different methods for generating blue noise (e.g. using optimal transport). But they are hard to implement and require expensive computation.
Additional context
Here is a sample of the Steinerberger sequence (1 initial random point)
Emcee 3.1.0 was recently released and an optimization using bask raised
File "/mnt/tuning-server/venv/lib/python3.8/site-packages/emcee/backends/backend.py", line 175, in grow
a = np.empty((i, self.nwalkers, self.ndim), dtype=self.dtype)
TypeError: 'numpy.float64' object cannot be interpreted as an integer
The library bayes-skopt currently supports all 3.x.x versions:
Line 30 in 7f3e7af
Currently, the optimizer calls the fit
function every time the tell
function is called (and the init phase is over). This is unnecessary, since MCMC sampling is used to update the parameters of the GP.
Lines 93 to 100 in 071d504
Is your feature request related to a problem? Please describe.
The Optimizer.run
method does not offer all of the parameters the tell
method offers.
Describe the solution you'd like
The parameters and their default values should be consistent.
It is possible for users to change the acquisition function manually during an optimization run.
Example:
from bask import Optimizer
from bask.acquisition import PVRS
opt = Optimizer(...)
opt.acq_func = PVRS()
This however only takes effect after the next point has been evaluated. It would be useful if we implement a setter which does the recomputation.
The log determinant which needs to be computed for the marginal log likelihood, slows down the inference already with moderate numbers of instances.
This approach might help:
https://arxiv.org/abs/1711.03481
The library currently supports only sequential evaluation of 1 point. When doing parallel hyperparameter optimization on a cluster, it would be beneficial to be able to propose several points.
Depending on the acquisition function used, it is clear/unclear on how to propose several points. For acquisition functions like expected improvement (EI) there exist variants which propose several points at once (Ginsbourger et al. 2007), but are difficult to compute. The "kriging believer" and "constant liar" heuristics are strategies, which can be applied to all acquisition functions, but also require sequential computation of a number of points.
Thompson sampling can be trivially parallelized and is a good first candidate.
The acquisition functions are currently called using mu
and std
already evaluated for a set of points:
bayes-skopt/bask/acquisition.py
Lines 24 to 28 in 071d504
For Thompson sampling we instead want to sample a GP instead of the mean process.
In some settings it is viable to parallelize the evaluation of the target function and results might arrive delayed. In order to be able to update the model with the new information it would be useful if the tell
method supports replacing the data, instead of appending.
Is your feature request related to a problem? Please describe.
Currently, it is hard to gauge, how good the optimum is which has been found so far.
This makes it hard to decide when to terminate an optimization run.
Describe the solution you'd like
A method Optimizer.probability_of_optimality(epsilon)
should be implemented.
It will output the probability that the current optimum is optimal with a tolerance of ε.
We currently employ maximum-value entropy search as the main acquisition function.
Nguyen et al (2017) argue that concentrating on collecting information about the value y* is not enough to find the optimum position x*.
They instead propose the following algorithm
Downside: Without having an efficient implementation for Thompson Sampling, this criterion is very costly to evaluate
The library only supports passing one acquisition function right now.
To support ensembles (e. g. GP hedge) or schedules (e. g. first explore using Thompson sampling, then switch to MES/EI to exploit), it should be possible to pass arbitrary callback classes.
Currently we pin scikit-learn
to 0.22, because the Gaussian process implementation of scikit-learn 0.23 introduced normalize_y
with division by the standard deviation. That will cause problems, when all datapoints produced the same output.
See:
scikit-optimize/scikit-optimize#947
scikit-learn/scikit-learn#18371
scikit-learn/scikit-learn#18318
scikit-learn/scikit-learn#18388
scikit-learn
to >=0.22,<0.24
scikit-optimize
to ^0.8
Currently, the library uses scikit-optimize/sklearn as a backend to do Gaussian process computations. These implementations are easy to use and served the library well so far.
A big problem is that the library is quite limited in functionality. One big use case of bayes-skopt is to handle very noisy target functions. In that regard it would be useful to be able to model heteroscedastic noise, which the Gaussian process in sklearn does not really support. It is possible to set the alpha
parameter to a vector, which will incorporate the noise of the training data, but during prediction it is still noiseless.
This could be useful for acquisition functions which properly handle the observation noise, like noisy EI and knowledge gradient.
GPFlow offers a lot of the needed functionality out of the box. It is straightforward to construct heteroscedastic likelihoods. It also supports stochastic variational Gaussian processes, which allow GPs to scale to more than 10k observations.
Therefore, migrating to GPflow as a backend would be a good long-term goal. Sadly, it will require a major rewrite of the library, since many classes (Optimizer, acquisition functions etc.) are tightly coupled to the current GP implementation.
Currently, it is only possible to pass marginal prior distributions to the library, since it iterates over the list of priors:
Lines 208 to 209 in a38808d
A common use case is to save the posterior distribution of the hyper parameters (e.g. using a mixture of Gaussians) and use it to jumpstart subsequent optimization runs.
This library is increasingly becoming difficult to install in python environments, due to the narrow range of scikit-learn versions it supports.
In addition, you could say that it is not that useful as a library to optimize ML algorithms, if it does not support current versions of scikit-learn.
bayes-skopt/bask/acquisition.py
Lines 215 to 257 in 1f77d51
To make the library more accessible, all the publicly facing methods should be properly documented. In addition example Jupyter notebooks could be beneficial to illustrate how the library is to be used.
Differences to the parent library scikit-optimize need to be clear.
Optimizer(...)
Optimizer.tell(...)
Optimizer.ask()
Optimizer.run(...)
BayesGPR.theta
(property)BayesGPR.noise_set_to_zero
(context manager)BayesGPR.sample(...)
BayesGPR.fit(...)
BayesGPR.sample_y(...)
PVRS
MaxValueSearch
ExpectedImprovement
TopTwoEI
LCB
Expectation
ThompsonSampling
VarianceReduction
BayesGPR
to a simple noisy 1d functionIt is a common use case that one would like to optimize a certain subspace of the parameter space using the knowledge gained so far for the full space.
MES in very rare cases causes an error here:
r = _zeros._bisect(f, a, b, xtol, rtol, maxiter, args, full_output, disp)
The BayesGPR
is a general purpose Gaussian process, but in this library is geared heavily towards hyper parameter optimization. Since normalize_y = False
can cause some weird behavior when optimizing, we should set it to True by default. This is more in line with what users will expect.
We are currently using skopt.utils.expected_minimum
to compute the best mean point of the Gaussian process. In some cases this function fails, because the bfgs optimizer exceeds the allowed ranges.
In the current implementation, we sample the hyperparameters of the Gaussian process and average across those samples. The training data is fixed. In BoTorch, the hyperparameters are fixed and the observations y of the data points are sampled, and the acquisition function is averaged over those.
I think both ideas can be combined, allowing the user to request:
The steps required are:
bayes-skopt/bask/acquisition.py
Line 48 in 8f1daf9
To Reproduce
Steps to reproduce the behavior:
pip install --upgrade bask==0.10.5
ERROR: Could not find a version that satisfies the requirement bask==0.10.5
ERROR: No matching distribution found for bask==0.10.5
Is there perhaps a connection to there being no tag named v0.10.5 in the GitHub repo?
Desktop (please complete the following information):
The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.
Computation of acquisition functions only on sampled points is problematic in high-dimensional spaces, where the distance to the true optimum (of the acquisition function) will be large on average.
Often times the parameters of the optimizer need to be changed during the optimization process. To this end it would be useful to have support for schedulers, which can set the parameters in relation to the iteration.
Another common application is to plot the current landscape in regular intervals.
Every time the hyperparameters of the BayesGPR
are changed, the cholesky factor L is recomputed. Usually this is desirable, to ensure that the model stays up-to-date. During optimization usually we sample hyperparameter configurations and do not need the average.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.