Code Monkey home page Code Monkey logo

bayes-skopt's People

Contributors

akshgpt7 avatar deepsource-autofix[bot] avatar deepsourcebot avatar evhub avatar kiudee avatar pyup-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

bayes-skopt's Issues

Version 0.10.5 not available through pip

To Reproduce
Steps to reproduce the behavior:

pip install --upgrade bask==0.10.5
ERROR: Could not find a version that satisfies the requirement bask==0.10.5
ERROR: No matching distribution found for bask==0.10.5

Is there perhaps a connection to there being no tag named v0.10.5 in the GitHub repo?

Desktop (please complete the following information):

  • OS: Linux

Fix error in MES criterion

MES in very rare cases causes an error here:

r = _zeros._bisect(f, a, b, xtol, rtol, maxiter, args, full_output, disp)

Implement predictive variance reduction search

We currently employ maximum-value entropy search as the main acquisition function.
Nguyen et al (2017) argue that concentrating on collecting information about the value y* is not enough to find the optimum position x*.

They instead propose the following algorithm
Screenshot_2020-01-12 Predictive Variance Reduction Search - 13 pdf

where
Screenshot_2020-01-12 Predictive Variance Reduction Search - 3ad52c31e19a3874274fa6071953a4b381c5 pdf

Downside: Without having an efficient implementation for Thompson Sampling, this criterion is very costly to evaluate

Support callbacks for the optimizer

Often times the parameters of the optimizer need to be changed during the optimization process. To this end it would be useful to have support for schedulers, which can set the parameters in relation to the iteration.

Another common application is to plot the current landscape in regular intervals.

Investigate the zeus library

The zeus library is a recent black-box MCMC library, which looks like it can be a drop-in replacement for emcee.
If it results in either (1) faster performance or (2) faster/more robust convergence, it could be worthwhile.

Improve documentation

To make the library more accessible, all the publicly facing methods should be properly documented. In addition example Jupyter notebooks could be beneficial to illustrate how the library is to be used.
Differences to the parent library scikit-optimize need to be clear.

To do

  • Set up API reference in sphinx
  • Write the docstrings for
    • Optimizer(...)
    • Optimizer.tell(...)
    • Optimizer.ask()
    • Optimizer.run(...)
    • BayesGPR.theta (property)
    • BayesGPR.noise_set_to_zero (context manager)
    • BayesGPR.sample(...)
    • BayesGPR.fit(...)
    • BayesGPR.sample_y(...)
    • Acquisition functions:
      • PVRS
      • MaxValueSearch
      • ExpectedImprovement
      • TopTwoEI
      • LCB
      • Expectation
      • ThompsonSampling
      • VarianceReduction
  • Write example notebooks for
    • How to fit a BayesGPR to a simple noisy 1d function
    • How to optimize a simple noisy 1d function
    • How to warm start an optimization
    • How to save/resume
    • How to save the hyperposterior and load it for the next optimization
  • Write usage instructions in sphinx
  • Link notebooks to sphinx

Support joint prior distributions

Currently, it is only possible to pass marginal prior distributions to the library, since it iterates over the list of priors:

for prior, val in zip(priors, x):
lp += prior(val)

A common use case is to save the posterior distribution of the hyper parameters (e.g. using a mixture of Gaussians) and use it to jumpstart subsequent optimization runs.

Update dependencies

Currently we pin scikit-learn to 0.22, because the Gaussian process implementation of scikit-learn 0.23 introduced normalize_y with division by the standard deviation. That will cause problems, when all datapoints produced the same output.

See:
scikit-optimize/scikit-optimize#947
scikit-learn/scikit-learn#18371
scikit-learn/scikit-learn#18318
scikit-learn/scikit-learn#18388

To do

  • Update scikit-learn to >=0.22,<0.24
  • Update scikit-optimize to ^0.8
  • Evaluate if tests need to be recalculated (likely) and update them

Stabilize best_mean output in BayesSearchCV

We are currently using skopt.utils.expected_minimum to compute the best mean point of the Gaussian process. In some cases this function fails, because the bfgs optimizer exceeds the allowed ranges.

Support setting constant parameters

It is a common use case that one would like to optimize a certain subspace of the parameter space using the knowledge gained so far for the full space.

Avoid unnecessary computation of cholesky decompositions

Every time the hyperparameters of the BayesGPR are changed, the cholesky factor L is recomputed. Usually this is desirable, to ensure that the model stays up-to-date. During optimization usually we sample hyperparameter configurations and do not need the average.

Support ensembles of acquisition functions

The library only supports passing one acquisition function right now.
To support ensembles (e. g. GP hedge) or schedules (e. g. first explore using Thompson sampling, then switch to MES/EI to exploit), it should be possible to pass arbitrary callback classes.

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Augment hyperposterior sampling by also sampling observed values

In the current implementation, we sample the hyperparameters of the Gaussian process and average across those samples. The training data is fixed. In BoTorch, the hyperparameters are fixed and the observations y of the data points are sampled, and the acquisition function is averaged over those.

I think both ideas can be combined, allowing the user to request:

  • Holding both hyperparameters and observations fixed: This will yield the classical acquisition functions for the noiseless case.
  • Sampling only the observations y. This will yield the noisy acquisition function versions, assuming that the hyperparameters are accurate.
  • Sampling only the hyperparameters. This will yield the current behavior, where we are robust to model misspecification.
  • Sampling both the hyperparameters and the observations. Combining the strengths of both approaches.

The steps required are:

  • Implement a fast method for cloning the GP model using the same, or different hyperparameters.
  • Add the different methods for averaging to
    def evaluate_acquisitions(

Implement an asynchronous version of tell

The computation of tell can become quite slow when the number of observations grow. Since we optimize slow black-box functions, it would be useful to have a tell_async method which would not block.

Allow optimization based acquisition functions

Computation of acquisition functions only on sampled points is problematic in high-dimensional spaces, where the distance to the true optimum (of the acquisition function) will be large on average.

Support proposal of several points in parallel

Rationale

The library currently supports only sequential evaluation of 1 point. When doing parallel hyperparameter optimization on a cluster, it would be beneficial to be able to propose several points.

Depending on the acquisition function used, it is clear/unclear on how to propose several points. For acquisition functions like expected improvement (EI) there exist variants which propose several points at once (Ginsbourger et al. 2007), but are difficult to compute. The "kriging believer" and "constant liar" heuristics are strategies, which can be applied to all acquisition functions, but also require sequential computation of a number of points.
Thompson sampling can be trivially parallelized and is a good first candidate.

Tasks

  • Set up the interfaces and the library to support multiple points (including caching and on-demand computation).
  • Implement parallel computation for Thompson sampling.
  • Implement constant liar/kriging believer.

Implement support for Thompson sampling

The acquisition functions are currently called using mu and std already evaluated for a set of points:

mu, std = gpr.predict(X, return_std=True)
for j, acq in enumerate(acquisition_functions):
tmp_out = acq(mu, std, **kwargs)
if np.all(np.isfinite(tmp_out)):
acq_output[j] += tmp_out

For Thompson sampling we instead want to sample a GP instead of the mean process.

Investigate the compatibility with emcee 3.1.0

Emcee 3.1.0 was recently released and an optimization using bask raised

File "/mnt/tuning-server/venv/lib/python3.8/site-packages/emcee/backends/backend.py", line 175, in grow
a = np.empty((i, self.nwalkers, self.ndim), dtype=self.dtype)
TypeError: 'numpy.float64' object cannot be interpreted as an integer

The library bayes-skopt currently supports all 3.x.x versions:

emcee = "^3.0.2"

but should 3.1.0 break compatibility, steps should be taken to fix that.

Implement probability of optimality

Is your feature request related to a problem? Please describe.
Currently, it is hard to gauge, how good the optimum is which has been found so far.
This makes it hard to decide when to terminate an optimization run.

Describe the solution you'd like
A method Optimizer.probability_of_optimality(epsilon) should be implemented.
It will output the probability that the current optimum is optimal with a tolerance of ε.

Support for GPFlow as a backend

Currently, the library uses scikit-optimize/sklearn as a backend to do Gaussian process computations. These implementations are easy to use and served the library well so far.
A big problem is that the library is quite limited in functionality. One big use case of bayes-skopt is to handle very noisy target functions. In that regard it would be useful to be able to model heteroscedastic noise, which the Gaussian process in sklearn does not really support. It is possible to set the alpha parameter to a vector, which will incorporate the noise of the training data, but during prediction it is still noiseless.
This could be useful for acquisition functions which properly handle the observation noise, like noisy EI and knowledge gradient.

GPFlow offers a lot of the needed functionality out of the box. It is straightforward to construct heteroscedastic likelihoods. It also supports stochastic variational Gaussian processes, which allow GPs to scale to more than 10k observations.
Therefore, migrating to GPflow as a backend would be a good long-term goal. Sadly, it will require a major rewrite of the library, since many classes (Optimizer, acquisition functions etc.) are tightly coupled to the current GP implementation.

Roadmap

  • Get familiar with all relevant aspects of GPFlow.
  • Identify an interface for the GP component which would allow plugging in different backends.
  • Migrate the existing code to obey the new interface specification, reducing the coupling between the components.
  • Add the GPFlow GP implementation.
  • Add GPFlow specific features.

Migrate to Github Actions & Nox

The repository is currently using travis-ci.com, which is no longer free for open source projects.
That is why the repository should migrate to Github Actions. The opportunity can be used to move from tox to nox.

Expose random seed for MaxValueSearch

class MaxValueSearch(UncertaintyAcquisition):
"""Select points based on their mutual information with the optimum value.
Parameters
----------
n_min_samples : int, default=1000
Number of samples for the optimum distribution
References
----------
[1] Wang, Z. & Jegelka, S.. (2017). Max-value Entropy Search for Efficient
Bayesian Optimization. Proceedings of the 34th International Conference
on Machine Learning, in PMLR 70:3627-3635
"""
def __call__(self, mu, std, *args, n_min_samples=1000, **kwargs):
def probf(x):
return np.exp(np.sum(st.norm.logcdf((x - mean) / std), axis=0))
# Negative sign, since the original algorithm is defined in terms of the maximum
mean = -mu
left = np.min(mean - 3 * std)
right = np.max(mean + 5 * std)
# Binary search for 3 percentiles
q1, med, q2 = [
brentq(lambda x: probf(x) - val, left, right,) for val in [0.25, 0.5, 0.75]
]
beta = (q1 - q2) / (np.log(np.log(4.0 / 3.0)) - np.log(np.log(4.0)))
alpha = med + beta * np.log(np.log(2.0))
max_values = (
-np.log(-np.log(np.random.rand(n_min_samples).astype(np.float32))) * beta
+ alpha
)
gamma = (max_values[None, :] - mean[:, None]) / std[:, None]
return (
np.sum(
gamma * st.norm().pdf(gamma) / (2.0 * st.norm().cdf(gamma))
- st.norm().logcdf(gamma),
axis=1,
)
/ n_min_samples
)

Activate normalize_y by default

The BayesGPR is a general purpose Gaussian process, but in this library is geared heavily towards hyper parameter optimization. Since normalize_y = False can cause some weird behavior when optimizing, we should set it to True by default. This is more in line with what users will expect.

Recompute proposal when acquisition function is changed

It is possible for users to change the acquisition function manually during an optimization run.

Example:

from bask import Optimizer
from bask.acquisition import PVRS
opt = Optimizer(...)
opt.acq_func = PVRS()

This however only takes effect after the next point has been evaluated. It would be useful if we implement a setter which does the recomputation.

Allow tell method to completely override data

Rationale

In some settings it is viable to parallelize the evaluation of the target function and results might arrive delayed. In order to be able to update the model with the new information it would be useful if the tell method supports replacing the data, instead of appending.

Add Steinerberger low-discrepancy sequence

Is your feature request related to a problem? Please describe.
The R2 sequence we currently use for initialization works well for low dimensions but has visible patterns in higher dimensions.
Steinerberger (2019) propose a simple energy functional which can be used to add points to an existing design which minimize discrepancy:
low_discrepancy

Describe alternatives you've considered
I investigated different methods for generating blue noise (e.g. using optimal transport). But they are hard to implement and require expensive computation.

Additional context
Here is a sample of the Steinerberger sequence (1 initial random point)
steinberger

Here is a uniform sample of 100 points in [0, 1]⁵
uniform

Our current R2 sequence outputs
r2

Update scikit-learn dependency

This library is increasingly becoming difficult to install in python environments, due to the narrow range of scikit-learn versions it supports.
In addition, you could say that it is not that useful as a library to optimize ML algorithms, if it does not support current versions of scikit-learn.

Unify the parameters of the tell and run method

Is your feature request related to a problem? Please describe.
The Optimizer.run method does not offer all of the parameters the tell method offers.

Describe the solution you'd like
The parameters and their default values should be consistent.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.