Code Monkey home page Code Monkey logo

botorch's Introduction

BoTorch Logo

Support Ukraine Lint Test Docs Tutorials Codecov

Conda PyPI License

BoTorch is a library for Bayesian Optimization built on PyTorch.

BoTorch is currently in beta and under active development!

Why BoTorch ?

BoTorch

  • Provides a modular and easily extensible interface for composing Bayesian optimization primitives, including probabilistic models, acquisition functions, and optimizers.
  • Harnesses the power of PyTorch, including auto-differentiation, native support for highly parallelized modern hardware (e.g. GPUs) using device-agnostic code, and a dynamic computation graph.
  • Supports Monte Carlo-based acquisition functions via the reparameterization trick, which makes it straightforward to implement new ideas without having to impose restrictive assumptions about the underlying model.
  • Enables seamless integration with deep and/or convolutional architectures in PyTorch.
  • Has first-class support for state-of-the art probabilistic models in GPyTorch, including support for multi-task Gaussian Processes (GPs) deep kernel learning, deep GPs, and approximate inference.

Target Audience

The primary audience for hands-on use of BoTorch are researchers and sophisticated practitioners in Bayesian Optimization and AI. We recommend using BoTorch as a low-level API for implementing new algorithms for Ax. Ax has been designed to be an easy-to-use platform for end-users, which at the same time is flexible enough for Bayesian Optimization researchers to plug into for handling of feature transformations, (meta-)data management, storage, etc. We recommend that end-users who are not actively doing research on Bayesian Optimization simply use Ax.

Installation

Installation Requirements

  • Python >= 3.10
  • PyTorch >= 1.13.1
  • gpytorch == 1.11
  • linear_operator == 0.5.1
  • pyro-ppl >= 1.8.4
  • scipy
  • multiple-dispatch

Prerequisite only for MacOS users with Intel processors:

Before installing BoTorch, we recommend first manually installing PyTorch, a required dependency of BoTorch. Installing it according to the PyTorch installation instructions ensures that it is properly linked against MKL, a library that optimizes mathematical computation for Intel processors. This will result in up to an order-of-magnitude speed-up for Bayesian optimization, as at the moment, installing PyTorch from pip does not link against MKL.

The PyTorch installation instructions currently recommend:

  1. Install Anaconda. Note that there are different installers for Intel and M1 Macs.
  2. Install PyTorch following the PyTorch installation instructions. Currently, this suggests running conda install pytorch torchvision -c pytorch.

If you want to customize your installation, please follow the PyTorch installation instructions to build from source.

Option 1: Installing the latest release

The latest release of BoTorch is easily installed either via Anaconda (recommended) or pip.

To install BoTorch from Anaconda, run

conda install botorch -c pytorch -c gpytorch -c conda-forge

The above command installs BoTorch and any needed dependencies. -c pytorch -c gpytorch -c conda-forge means that the most preferred source to install from is the PyTorch channel, the next most preferred is the GPyTorch channel, and the least preferred is conda-forge.

Alternatively, to install with pip, do

pip install botorch

Note: Make sure the pip being used is actually the one from the newly created Conda environment. If you're using a Unix-based OS, you can use which pip to check.

Option 2: Installing from latest main branch

If you would like to try our bleeding edge features (and don't mind potentially running into the occasional bug here or there), you can install the latest development version directly from GitHub. If you want to also install the current gpytorch and linear_operator development versions, you will need to ensure that the ALLOW_LATEST_GPYTORCH_LINOP environment variable is set:

pip install --upgrade git+https://github.com/cornellius-gp/linear_operator.git
pip install --upgrade git+https://github.com/cornellius-gp/gpytorch.git
export ALLOW_LATEST_GPYTORCH_LINOP=true
pip install --upgrade git+https://github.com/pytorch/botorch.git

Option 3: Editable/dev install

If you want to contribute to BoTorch, you will want to install editably so that you can change files and have the changes reflected in your local install.

If you want to install the current gpytorch and linear_operator development versions, as in Option 2, do that before proceeding.

Option 3a: Bare-bones editable install

git clone https://github.com/pytorch/botorch.git
cd botorch
pip install -e .

Option 3b: Editable install with development and tutorials dependencies

git clone https://github.com/pytorch/botorch.git
cd botorch
export ALLOW_BOTORCH_LATEST=true
pip install -e ".[dev, tutorials]"
  • dev: Specifies tools necessary for development (testing, linting, docs building; see Contributing below).
  • tutorials: Also installs all packages necessary for running the tutorial notebooks.
  • You can also install either the dev or tutorials dependencies without installing both, e.g. by changing the last command to pip install -e ".[dev]".

Getting Started

Here's a quick run down of the main components of a Bayesian optimization loop. For more details see our Documentation and the Tutorials.

  1. Fit a Gaussian Process model to data
import torch
from botorch.models import SingleTaskGP
from botorch.fit import fit_gpytorch_mll
from gpytorch.mlls import ExactMarginalLogLikelihood

# Double precision is highly recommended for GPs.
# See https://github.com/pytorch/botorch/discussions/1444
train_X = torch.rand(10, 2, dtype=torch.double)
Y = 1 - (train_X - 0.5).norm(dim=-1, keepdim=True)  # explicit output dimension
Y += 0.1 * torch.rand_like(Y)
train_Y = (Y - Y.mean()) / Y.std()

gp = SingleTaskGP(train_X, train_Y)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_mll(mll)
  1. Construct an acquisition function
from botorch.acquisition import UpperConfidenceBound

UCB = UpperConfidenceBound(gp, beta=0.1)
  1. Optimize the acquisition function
from botorch.optim import optimize_acqf

bounds = torch.stack([torch.zeros(2), torch.ones(2)])
candidate, acq_value = optimize_acqf(
    UCB, bounds=bounds, q=1, num_restarts=5, raw_samples=20,
)

Citing BoTorch

If you use BoTorch, please cite the following paper:

M. Balandat, B. Karrer, D. R. Jiang, S. Daulton, B. Letham, A. G. Wilson, and E. Bakshy. BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. Advances in Neural Information Processing Systems 33, 2020.

@inproceedings{balandat2020botorch,
  title={{BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization}},
  author={Balandat, Maximilian and Karrer, Brian and Jiang, Daniel R. and Daulton, Samuel and Letham, Benjamin and Wilson, Andrew Gordon and Bakshy, Eytan},
  booktitle = {Advances in Neural Information Processing Systems 33},
  year={2020},
  url = {http://arxiv.org/abs/1910.06403}
}

See here for an incomplete selection of peer-reviewed papers that build off of BoTorch.

Contributing

See the CONTRIBUTING file for how to help out.

License

BoTorch is MIT licensed, as found in the LICENSE file.

botorch's People

Contributors

amyreese avatar balandat avatar bkarrer avatar bletham avatar danielrjiang avatar dependabot[bot] avatar dme65 avatar docusaurus-bot avatar ericzlou avatar esantorella avatar eytan avatar hvarfner avatar itsmrlin avatar jduerholt avatar jelena-markovic avatar kkashin avatar ldworkin avatar lena-kashtelyan avatar liangshi7 avatar liusulin avatar mpolson64 avatar qingfeng10 avatar roussel-ryan avatar saitcakmak avatar sdaulton avatar sebastianament avatar vilockli avatar vishwakftw avatar wjmaddox avatar zpao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

botorch's Issues

[Bug] No module named 'botorch.sampling'

๐Ÿ› Bug

When following the tutorial from https://botorch.org/tutorials/closed_loop_botorch_only I get a ModuleNotFoundError: No module named 'botorch.sampling'

To reproduce

Run notebook from https://botorch.org/files/closed_loop_botorch_only.ipynb in environment with:

botorch                   0.1.0                    pypi_0    pypi
gpytorch                  0.3.2                    pypi_0    pypi
python                    3.7.3                h0371630_0 
pytorch-cpu               1.1.0               py3.7_cpu_0    pytorch

I installed botorch via pip because the conda version relies on a conda version of gpytorch which requires pytorch with CUDA and not pytorch-cpu. I'm unable to install botorch with conda install botorch -c pytorch because of this.

** Stack trace/error message **

----> 4 from botorch.sampling.samplers import SobolQMCNormalSampler

ModuleNotFoundError: No module named 'botorch.sampling'

Expected Behavior

botorch.sampling can be imported.

System information

Please complete the following information:

  • BoTorch Version (0.1.0)
  • GPyTorch Version (0.3.2)
  • PyTorch Version (1.1.0)
  • Ubuntu 18.04

Additional context

Add any other context about the problem here.

Add num_outputs property to the model API

Currently only BatchedMuliOutputGpyTorchModels have a _num_outputs property for internal bookkeeping. It would be good to have a num_outputs property defined for all models, so this can be used in a transparent fashion, e.g. for making sure that we don't use multi-output models with acquisition functions that do not support them.

I'm hesitant to add more and more sugar to the required model API so as to not make it super hard to implement custom models, but this change seems to be lightweight enough. We can also not make it an abstract property, but just raise a NotImplementedError, so people who don't need it don't need to worry about it.

how to draw a sample path from a prior GP? (None inputs not supported)

Issue description

I'd like to do some simulation experiments where the function is drawn from a GP, but it seems that current implementation of, e.g., SingleTaskGP, doesn't support passing in None and train_inputs or train_targets as the original ExactGP does. We could use train mode to ignore the training inputs, but GPyTorch forces checking the inputs so that they are consistent with the model.train_inputs by hardcoding settings.debug.on = True.

Is there anyother way to draw from a SingleTaskGP prior?

Add validation for output shapes in acquisition functions

As pointed out in #325, specifying no objective in an acquisition function if the model is multi-output may currently result in undefined behavior when optimizing the acquisition function.

We should add validation to the acquisition function constructors that will error out cleanly if a multi-output model is used but no objective is specified. This will require #295 to be addressed first.

[Bug] SingleTaskGP's wrong gradients when batch_size = 1

๐Ÿ› Bug

When I try to calculate the gradient of loss w.r.t. input, I get different results every run when I pass one point to GP.

To reproduce

import torch
from botorch.models import SingleTaskGP
from botorch.fit import fit_gpytorch_model
from gpytorch.mlls import ExactMarginalLogLikelihood

X = torch.randn(100, 1)
y = X.sum(dim=1, keepdims=True)**2

gp = SingleTaskGP(X, y)

mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)

x_test = torch.randn(1, 1)
x_test.requires_grad_(True)

gp.eval()

# Calculate gradient w.r.t. to the same input point 5 times
for _ in range(5):
    loss = gp(x_test).mean.sum()
    loss.backward()

    print(x_test.grad)
    x_test = x_test.detach()
    x_test.requires_grad_(True)

The output looks like this

tensor([[0.0597]])
tensor([[-4.8402]])
tensor([[-6.9655e+37]])
tensor([[-2.6707e+17]])
tensor([[nan]])

Expected Behavior

I expect the same result in each iteration. The bug appears only when I try to evaluate gradient at one input point. If I use batch size greater than 1:

x_test = torch.randn(2, 1)
x_test.requires_grad_(True)

gp.eval()

# Calculate gradient w.r.t. to the same input point 5 times
for _ in range(5):
    loss = gp(x_test).mean.sum()
    loss.backward()

    print(x_test.grad)
    x_test = x_test.detach()
    x_test.requires_grad_(True)

The output will be correct

tensor([[2.2270],
        [2.7313]])
tensor([[2.2270],
        [2.7313]])
tensor([[2.2270],
        [2.7313]])
tensor([[2.2270],
        [2.7313]])
tensor([[2.2270],
        [2.7313]])

System information

  • Botorch==0.1.3
  • GPyTorch=0.3.5
  • PyTorch==1.2.0
  • OS: Ubuntu 19.04

gradient of posterior mean w.r.t. x

Is there a way to get the gradient of the posterior mean w.r.t. x?
For example,

`
posterior = model.posterior(x)

x_grad = torch.autograd.grad(posterior.mean, x, create_graph=True)[0]
`

But seems currently the posterior computation is harded coded to be eval mode, so posterior.mean does not have a grad_fn

Is there a way around this?

ubuntu conda

ubuntu 19.04,python 3.7.3 ,is use conda.
pytorch 1.2.0,botorch is not install.

Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Package ncurses conflicts for:
botorch -> ncurses[version='>=6.1,<7.0a0']
python=3.7 -> ncurses[version='>=6.1,<6.2.0a0|>=6.1,<7.0a0']
Package openssl conflicts for:
botorch -> openssl[version='>=1.1.1b,<1.1.2a']
python=3.7 -> openssl[version='>=1.0.2o,<1.0.3a|>=1.0.2p,<1.0.3a|>=1.1.1a,<1.1.2a|>=1.1.1b,<1.1.2a']
Package bzip2 conflicts for:
python=3.7 -> bzip2[version='>=1.0.6,<2.0a0']
Package pip conflicts for:
python=3.7 -> pip
Package tk conflicts for:
python=3.7 -> tk[version='>=8.6.7,<8.7.0a0|>=8.6.8,<8.7.0a0|>=8.6.9,<8.7.0a0']
botorch -> tk[version='>=8.6.8,<8.7.0a0']
Package pytorch conflicts for:
botorch -> pytorch[version='>=1.1']
Package libffi conflicts for:
botorch -> libffi[version='>=3.2.1,<4.0a0']
python=3.7 -> libffi[version='>=3.2.1,<3.3.0a0|>=3.2.1,<4.0a0']
Package sqlite conflicts for:
python=3.7 -> sqlite[version='>=3.24.0,<4.0a0|>=3.25.1,<4.0a0|>=3.25.2,<4.0a0|>=3.25.3,<4.0a0|>=3.26.0,<4.0a0|>=3.27.2,<4.0a0|>=3.28.0,<4.0a0']
botorch -> sqlite[version='>=3.28.0,<4.0a0']
Package xz conflicts for:
python=3.7 -> xz[version='>=5.2.3,<5.3.0a0|>=5.2.4,<5.3.0a0|>=5.2.4,<6.0a0']
botorch -> xz[version='>=5.2.4,<6.0a0']
Package libedit conflicts for:
botorch -> libedit[version='>=3.1.20181209,<3.2.0a0']
Package readline conflicts for:
python=3.7 -> readline[version='>=7.0,<8.0a0|>=8.0,<9.0a0']
botorch -> readline[version='>=7.0,<8.0a0']
Package scipy conflicts for:
botorch -> scipy
Package libstdcxx-ng conflicts for:
python=3.7 -> libstdcxx-ng[version='>=4.9|>=7.2.0|>=7.3.0']
Package zlib conflicts for:
botorch -> zlib[version='>=1.2.11,<1.3.0a0']
python=3.7 -> zlib[version='>=1.2.11,<1.3.0a0']
Package libgcc-ng conflicts for:
python=3.7 -> libgcc-ng[version='>=4.9|>=7.2.0|>=7.3.0']
Package gpytorch conflicts for:
botorch -> gpytorch[version='>=0.3.2|>=0.3.3|>=0.3.4']

[Bug] Documentation in botorch.utils.transforms.normalize is confusing/wrong

๐Ÿ› Bug

In the documentation for botorch.utils.transforms.normalize, it is written that the function is used to:

"Min-max normalize X to [0, 1] using the provided bounds."

However, this is slightly confusing, since the function will only transform X to [0,1] if the bounds provided represent the minimum and maximum of X. This ambiguous documentation string also features in the unnormalize function.

Furthermore, the example given for the normalize function actually uses the unnormalize function as shown below:

Example:
        >>> X = torch.rand(4, 3)
        >>> bounds = torch.stack([torch.zeros(3), 0.5 * torch.ones(3)])
        >>> X_normalized = unnormalize(X, bounds)

I am happy to submit a pull request, but I think the point of this function should first be made clear. If the point is to provide min-max scaling with user-provided bounds -- then the one line at the beginning of the doc-string is all that needs changing. However, if the function is to provide min-max scaling to [0,1] given the data provided in X, bounds should either be automatically calculated, or the example should be something like:

X_bounds = torch.stack([X_train.min()*torch.ones(X_train.shape[1]),
                                          X_train.max()*torch.ones(X_train.shape[1])])     
X_train_norm = normalize(X_train, X_bounds)

or

X_bounds = torch.stack([X_train.min(dim=1), X_train.max(dim=1)])     
X_train_norm = normalize(X_train, X_bounds)

Hope this is helpful!

[Bug] cholesky_cuda: U(1,1) error with certain types of Data on fit_gpytorch_model

๐Ÿ› Bug

When using fit_gpytorch_model on certain types of data, I consistently get a RuntimeError: cholesky_cuda: U(1,1) is zero, singular U. error. This issue has been partially fixed, brought back and is currently fully present again.

To reproduce

Working Colab Example

** Stack trace/error message **

RuntimeError                              Traceback (most recent call last)
<ipython-input-4-7a985b2d2e2b> in <module>()
     53   mll = ExactMarginalLogLikelihood(model.likelihood, model,)
     54 
---> 55   fit_gpytorch_model(mll, optimizer=fit_gpytorch_torch)
     56   acq_function = UpperConfidenceBound(model, beta=0.1, maximize=False)
     57   candidates = joint_optimize(

10 frames
/usr/local/lib/python3.6/dist-packages/gpytorch/utils/cholesky.py in psd_safe_cholesky(A, upper, out, jitter)
     19     """
     20     try:
---> 21         L = torch.cholesky(A, upper=upper, out=out)
     22         # TODO: Remove once fixed in pytorch (#16780)
     23         if A.dim() > 2 and A.is_cuda:

RuntimeError: cholesky_cuda: U(1,1) is zero, singular U.

Expected Behavior

For fit_gpytorch_model to successfully fit the model.

System information

Please complete the following information:
Master versions of botorch and gpytorch on Google Colab.

Question about fully bayesian treatment for GP hyperparameters

Hi.

While I am studing about botorch's docs and tutorials, I have a question about inferring GP's kernel hyperparamters.

It seems like Botorch infers GP kernel hyperparameters via optimizing GP's likelihood and
I cannot find a way of fully bayesian treatment of GP's hyperparameters through MCMC (slice sampling or NUTS). Is it implemented in Botorch?

Thanks you.

Validate shape for best_f in qExpectedImprovement

๐Ÿ› Bug

Hi,
I am using Botorch to do multi-objective optimization where the number of model outputs is m (m>1). I found optimization of qEI acquisition function is not working as expected for m greater than 1 case.

I think it is trying to call the function initialize_q_batch_nonneg (which is required by qEI for initialization) and selecting the right indices by passing the flags in alpha_pos. Alpha_pos is 2D tensor but torch.arange(len(Y)) gives a 1D tensor. Or did I miss something here?

I used some randomly generated X and Y tensors in the example below. I hope it is clear.

To reproduce

** Code snippet to reproduce **


import numpy as np
import torch
from botorch.models import SingleTaskGP, FixedNoiseGP, HeteroskedasticSingleTaskGP
from botorch.models.gpytorch import GPyTorchModel
from botorch.acquisition.monte_carlo import qExpectedImprovement, qNoisyExpectedImprovement
from botorch.optim import optimize_acqf
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch.optim.fit import fit_gpytorch_torch
from botorch.sampling.samplers import SobolQMCNormalSampler


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype = torch.float


def _get_and_fit_simple_gp(Xs, Ys, **kwargs):

    model = SingleTaskGP(train_X=Xs, train_Y=Ys)
    model.train();
    mll = ExactMarginalLogLikelihood(model.likelihood, model)
    mll.train();
    fit_gpytorch_torch(mll);
    mll.eval();
    model.eval();
    return model    


nsamples = 10
nx = 3
my = 2

train_X = torch.rand(nsamples,nx)
train_Y =  torch.rand(nsamples,my)

model = _get_and_fit_simple_gp(train_X, train_Y)

bounds = torch.stack([torch.zeros(nx), torch.ones(nx)])
best_value = torch.ones(my) 

acq_func = qExpectedImprovement(model, best_f= best_value )

new_X, acq_value = optimize_acqf(acq_func, bounds=bounds, q= 1, num_restarts=20, raw_samples=100)

** Stack trace/error message **

Traceback (most recent call last):
  File "c:\Users\yifan\Documents\GitHub\Learn-BO\testing_files\debug_temp_file.py", line 52, in <module>
    new_X, acq_value = optimize_acqf(acq_func, bounds=bounds, q= 1, num_restarts=20, raw_samples=100)
  File "C:\Users\yifan\Anaconda3\envs\torch\lib\site-packages\botorch\optim\optimize.py", line 148, in optimize_acqf
    options=options,
  File "C:\Users\yifan\Anaconda3\envs\torch\lib\site-packages\botorch\optim\initializers.py", line 103, in gen_batch_initial_conditions
    X=X_rnd, Y=Y_rnd, n=num_restarts, **init_kwargs
  File "C:\Users\yifan\Anaconda3\envs\torch\lib\site-packages\botorch\optim\initializers.py", line 347, in initialize_q_batch_nonneg
    alpha_pos_idcs = torch.arange(len(Y), device=Y.device)[alpha_pos]
IndexError: too many indices for tensor of dimension 1

Expected Behavior

qEI acquisition function can be correctly optimized and return the right next sampling points.

System information

Please complete the following information:

  • 0.1.4
  • 0.3.5
  • 1.2.0
  • Windows-10-10.0.17763-SP0

[Bug] Cholesky is NaN while fitting gpytorch model.

๐Ÿ› Bug

While fitting a gpytorch model to some training data (using fit_gpytorch_model()), I get the following runtime error (see [facebook/Ax#99)], #179 for related issues). Strangely, this happens when running the code on my Mac and not when running it on an Ubuntu machine, despite using the same seed, exact same training data, and same software version. It is worth noting that I observe the same behavior across a range of input data, e.g. when using other botorch test functions. In almost every case, the code runs on the Ubuntu machine without any issues.

Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/utils/cholesky.py", line 24, in psd_safe_cholesky
L = torch.cholesky(A, upper=upper, out=out)
RuntimeError: cholesky_cpu: U(2,2) is zero, singular U.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/saitcakmak/PycharmProjects/BoRisk/full_loop.py", line 60, in
fit_gpytorch_model(mll)
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/botorch/fit.py", line 101, in fit_gpytorch_model
mll, _ = optimizer(mll, track_iterations=False, **kwargs)
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/botorch/optim/fit.py", line 225, in fit_gpytorch_scipy
callback=cb,
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scipy/optimize/_minimize.py", line 600, in minimize
callback=callback, **options)
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 335, in _minimize_lbfgsb
f, g = func_and_grad(x)
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 285, in func_and_grad
f = fun(x, args)
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 326, in function_wrapper
return function(
(wrapper_args + args))
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 64, in call
fg = self.fun(x, *args)
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/botorch/optim/fit.py", line 283, in _scipy_objective_and_grad
raise e # pragma: nocover
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/botorch/optim/fit.py", line 278, in _scipy_objective_and_grad
loss = -mll(*args).sum()
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/module.py", line 24, in call
outputs = self.forward(*inputs, **kwargs)
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py", line 51, in forward
res = output.log_prob(target)
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/distributions/multivariate_normal.py", line 107, in log_prob
return super().log_prob(value)
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/distributions/multivariate_normal.py", line 208, in log_prob
M = _batch_mahalanobis(self._unbroadcasted_scale_tril, diff)
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/distributions/multivariate_normal.py", line 50, in _unbroadcasted_scale_tril
ust = delazify(self.lazy_covariance_matrix.cholesky())
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/lazy/lazy_tensor.py", line 738, in cholesky
res = self._cholesky()
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/utils/memoize.py", line 34, in g
add_to_cache(self, cache_name, method(self, *args, **kwargs))
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/lazy/lazy_tensor.py", line 413, in _cholesky
cholesky = psd_safe_cholesky(evaluated_mat).contiguous()
File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/utils/cholesky.py", line 30, in psd_safe_cholesky
f"cholesky_cpu: {isnan.sum().item()} of {A.numel()} elements of the {A.shape} tensor are NaN."
gpytorch.utils.errors.NanError: cholesky_cpu: 90 of 100 elements of the torch.Size([10, 10]) tensor are NaN.

To reproduce

Here is the relevant code snippet. As I mentioned, I cannot reproduce this on my Ubuntu machine.
** Code snippet to reproduce **

# Your code goes here
# Please make sure it does not require any external dependencies

import torch
from torch import Tensor
from botorch.models import SingleTaskGP
from botorch.fit import fit_gpytorch_model
from gpytorch.mlls import ExactMarginalLogLikelihood
import gpytorch
from torch.distributions import Uniform, Gamma
from typing import Union
from botorch.optim import optimize_acqf
from gpytorch.likelihoods import GaussianLikelihood
from gpytorch.constraints.constraints import GreaterThan
from gpytorch.priors.torch_priors import GammaPrior
from botorch.test_functions import Hartmann, ThreeHumpCamel, Beale, Branin, Powell
from standardized_function import StandardizedFunction

# fix the seed for testing
torch.manual_seed(0)

# Initialize the test function
noise_std = 0.1  # observation noise level
function = StandardizedFunction(Powell(noise_std=noise_std))

d = function.dim  # dimension of train_X
dim_w = 1  # dimension of w component
n = 2 * d + 2  # training samples
dim_x = d - dim_w  # dimension of the x component
train_X = torch.rand((n, d))
train_Y = function(train_X)

"""
StandardizedFunction is a class that transforms the domain of the SyntheticTestFunction to the unit hypercube. Here is the exact training data taken from the debug screen. I verified that this is the same across the two machines as well.

train_X
tensor([[0.4963, 0.7682, 0.0885, 0.1320],
        [0.3074, 0.6341, 0.4901, 0.8964],
        [0.4556, 0.6323, 0.3489, 0.4017],
        [0.0223, 0.1689, 0.2939, 0.5185],
        [0.6977, 0.8000, 0.1610, 0.2823],
        [0.6816, 0.9152, 0.3971, 0.8742],
        [0.4194, 0.5529, 0.9527, 0.0362],
        [0.1852, 0.3734, 0.3051, 0.9320],
        [0.1759, 0.2698, 0.1507, 0.0317],
        [0.2081, 0.9298, 0.7231, 0.7423]])
train_Y
tensor([[ 9581.6172],
        [ 8215.9570],
        [  426.3773],
        [ 4815.8296],
        [ 7884.1069],
        [ 2833.5762],
        [ 6308.6538],
        [20651.4551],
        [  553.5855],
        [ 7070.3433]])
"""

# construct and fit the GP
# a more involved prior to set a significant lower bound on the noise. Significantly speeds up computation.
noise_prior = GammaPrior(1.1, 0.5)
noise_prior_mode = (noise_prior.concentration - 1) / noise_prior.rate
likelihood = GaussianLikelihood(
    noise_prior=noise_prior,
    batch_shape=[],
    noise_constraint=GreaterThan(
        0.05,  # minimum observation noise assumed in the GP model
        transform=None,
        initial_value=noise_prior_mode,
    ),
)
gp = SingleTaskGP(train_X, train_Y, likelihood)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)

** Stack trace/error message **

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/utils/cholesky.py", line 24, in psd_safe_cholesky
    L = torch.cholesky(A, upper=upper, out=out)
RuntimeError: cholesky_cpu: U(2,2) is zero, singular U.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/botorch/fit.py", line 101, in fit_gpytorch_model
    mll, _ = optimizer(mll, track_iterations=False, **kwargs)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/botorch/optim/fit.py", line 225, in fit_gpytorch_scipy
    callback=cb,
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scipy/optimize/_minimize.py", line 600, in minimize
    callback=callback, **options)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 335, in _minimize_lbfgsb
    f, g = func_and_grad(x)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 285, in func_and_grad
    f = fun(x, *args)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 326, in function_wrapper
    return function(*(wrapper_args + args))
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 64, in __call__
    fg = self.fun(x, *args)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/botorch/optim/fit.py", line 283, in _scipy_objective_and_grad
    raise e  # pragma: nocover
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/botorch/optim/fit.py", line 278, in _scipy_objective_and_grad
    loss = -mll(*args).sum()
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/module.py", line 24, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py", line 51, in forward
    res = output.log_prob(target)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/distributions/multivariate_normal.py", line 107, in log_prob
    return super().log_prob(value)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/torch/distributions/multivariate_normal.py", line 208, in log_prob
    M = _batch_mahalanobis(self._unbroadcasted_scale_tril, diff)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/distributions/multivariate_normal.py", line 50, in _unbroadcasted_scale_tril
    ust = delazify(self.lazy_covariance_matrix.cholesky())
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/lazy/lazy_tensor.py", line 738, in cholesky
    res = self._cholesky()
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/utils/memoize.py", line 34, in g
    add_to_cache(self, cache_name, method(self, *args, **kwargs))
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/lazy/lazy_tensor.py", line 413, in _cholesky
    cholesky = psd_safe_cholesky(evaluated_mat).contiguous()
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gpytorch/utils/cholesky.py", line 30, in psd_safe_cholesky
    f"cholesky_cpu: {isnan.sum().item()} of {A.numel()} elements of the {A.shape} tensor are NaN."
gpytorch.utils.errors.NanError: cholesky_cpu: 90 of 100 elements of the torch.Size([10, 10]) tensor are NaN.

Expected Behavior

fit_gpytorch_model() fits the GP hyper parameters and code continues execution, as it does on Ubuntu.

System information

Please complete the following information:

  • BoTorch: 0.1.4
  • GPyTorch: 0.3.6
  • PyTorch: 1.3.1
  • OS: macOS Catalina 10.15.1 (error message), Ubuntu 18.04.3 LTS (code runs without any issues)

[Bug] NaNs encountered in running fit_gpytorch_model

๐Ÿ› Bug

I'm consistently encountering NaNs in the optimization loop of fit_gpytorch_model on simple examples. Any help would be appreciated.

To reproduce

The code (relevant part) looks like this.

def initialize_model(x0, y0, n=5):
    torch.manual_seed(0)
    # generate training data
    train_x = torch.randn(n, args.dim, device=device, dtype=dtype)
    
    train_obj = obj_func(decode(train_x), x0, y0)
    best_observed_value = train_obj.max().item()

    # define models for objective and constraint
    model = SingleTaskGP(train_X=train_x, train_Y=train_obj)
    model = model.to(train_x)

    mll = ExactMarginalLogLikelihood(model.likelihood, model)
    mll = mll.to(train_x)

    return train_x, train_obj, mll, model, best_observed_value


def optimize_acqf_and_get_observation(acq_func, x0, y0):
    """Optimizes the acquisition function, and returns a new candidate and a noisy observation"""
    """
    with torch.no_grad():
        _, mu, logvar = vae_model(x0)
    bounds = torch.stack([mu - 3.0 * torch.exp(0.5 * logvar),
                          mu + 3.0 * torch.exp(0.5 * logvar)])
    """
    # optimize
    candidates = joint_optimize(
        acq_function=acq_func,
        bounds=bounds,
        q=BATCH_SIZE,
        num_restarts=10,
        raw_samples=200,
    )

    # observe new values
    new_x = candidates.detach()

    new_obj = obj_func(decode(new_x), x0, y0)
    return new_x, new_obj

def bayes_opt(x0, y0):
    #print(f"\nRunning BO ", end='')
    best_observed = []
    success = 0

    # call helper function to initialize model
    train_x, train_obj, mll, model, best_value = initialize_model(x0, y0, n=5)
    best_observed.append(best_value)
    #print(train_obj)
    # run N_BATCH rounds of BayesOpt after the initial random batch
    for itr in range(N_BATCH):

        fit_gpytorch_model(mll)

        # define the qNEI acquisition module using a QMC sampler
        qmc_sampler = SobolQMCNormalSampler(num_samples=MC_SAMPLES, seed=seed)
        qEI = qExpectedImprovement(model=model, sampler=qmc_sampler, best_f=best_value)

        # optimize and get new observation
        new_x, new_obj = optimize_acqf_and_get_observation(qEI, x0, y0)

        # update training points
        train_x = torch.cat((train_x, new_x))
        train_obj = torch.cat((train_obj, new_obj))

        # update progress
        #best_value = score_image_recognition(decode(train_x)).max().item()
        best_value, best_index = train_obj.max(0)
        best_observed.append(best_value.item())
        # reinitialize the model so it is ready for fitting on next iteration
        model.set_train_data(train_x, train_obj, strict=False)
        best_candidate = decode(train_x[best_index].unsqueeze(0))

** Stack trace/error message **

Traceback (most recent call last):
  File "vae_mnist2.py", line 371, in <module>
    attack_mnist_bayes(num_attacks=args.num_attacks)
  File "vae_mnist2.py", line 316, in attack_mnist_bayes
    itr, success, pert = bayes_opt(image, label)
  File "vae_mnist2.py", line 224, in bayes_opt
    fit_gpytorch_model(mll)
  File "/home/snshukla/py33/lib/python3.7/site-packages/botorch/fit.py", line 98, in fit_gpytorch_model
    mll, _ = optimizer(mll, track_iterations=False, **kwargs)
  File "/home/snshukla/py33/lib/python3.7/site-packages/botorch/optim/fit.py", line 192, in fit_gpytorch_scipy
    callback=cb,
  File "/home/snshukla/py33/lib/python3.7/site-packages/scipy/optimize/_minimize.py", line 600, in minimize
    callback=callback, **options)
  File "/home/snshukla/py33/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 335, in _minimize_lbfgsb
    f, g = func_and_grad(x)
  File "/home/snshukla/py33/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 285, in func_and_grad
    f = fun(x, *args)
  File "/home/snshukla/py33/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 326, in function_wrapper
    return function(*(wrapper_args + args))
  File "/home/snshukla/py33/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 64, in __call__
    fg = self.fun(x, *args)
  File "/home/snshukla/py33/lib/python3.7/site-packages/botorch/optim/fit.py", line 239, in _scipy_objective_and_grad
    raise e  # pragma: nocover
  File "/home/snshukla/py33/lib/python3.7/site-packages/botorch/optim/fit.py", line 234, in _scipy_objective_and_grad
    loss = -mll(*args).sum()
  File "/home/snshukla/py33/lib/python3.7/site-packages/gpytorch/module.py", line 22, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/home/snshukla/py33/lib/python3.7/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py", line 28, in forward
    res = output.log_prob(target)
  File "/home/snshukla/py33/lib/python3.7/site-packages/gpytorch/distributions/multivariate_normal.py", line 128, in log_prob
    inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True)
  File "/home/snshukla/py33/lib/python3.7/site-packages/gpytorch/lazy/lazy_tensor.py", line 1041, in inv_quad_logdet
    *args,
  File "/home/snshukla/py33/lib/python3.7/site-packages/gpytorch/functions/_inv_quad_log_det.py", line 112, in forward
    solves, t_mat = lazy_tsr._solve(rhs, preconditioner, num_tridiag=num_random_probes)
  File "/home/snshukla/py33/lib/python3.7/site-packages/gpytorch/lazy/lazy_tensor.py", line 630, in _solve
    preconditioner=preconditioner,
  File "/home/snshukla/py33/lib/python3.7/site-packages/gpytorch/utils/linear_cg.py", line 161, in linear_cg
    raise RuntimeError("NaNs encounterd when trying to perform matrix-vector multiplication")
RuntimeError: NaNs encounterd when trying to perform matrix-vector multiplication

System information

Please complete the following information:

  • BoTorch Version: 0.1.1
  • GPyTorch Version: 0.3.3
  • PyTorch Version: 1.1.0
  • Computer OS: CentOS

[Question] How does optimize_acqf interact with multi-task GP?

Thanks for the cool project! I'm particularly interested in the multi-task modeling capability in both Ax and botorch. I've been trying to understand a little better how botorch handles this scenario, and I've hit a little bit of a wall in my understanding.

Specifically, how does acquisition function optimization interact with a multi-task GP model? When initializing a multi-task GP, we can specify which output tasks we'd like to model, eg MultiTaskGP(train_X, train_Y, task_feature=-1, output_tasks=[1]). If a single output task is specified, then the gpytorch connector will give us a multivariate normal, while if multiple output tasks are specified, then the gpytorch connector gives us a multi-task multivariate normal instead: https://github.com/pytorch/botorch/blob/master/botorch/models/gpytorch.py#L480-L489

Now let's throw in an acquisition function, eg qNoisyExpectedImprovement. Here we define a way to evaluate candidate points to test, based on the posterior of the fitted model. Meanwhile, optimize_acqf will use the acquisition function in generating and selecting candidate points based on the acquisition function's evaluation.

So far so good! But here's the heart of the question, which I'm having a hard time understanding from the code: how are the candidate points determined when we have a multi-task GP? Are they optimized for expected performance on all tasks or only on a specific task? Does this depend on what output tasks we specify in the model initialization (and thus, on whether we are estimating a multi-task multivariate posterior or not?) Empirically, it doesn't seem to, but I'm not sure where the smoking gun in the code is that determines this.

Parallelize evaluations of forward model

I have implemented a genetic algorithm that calls forward evaluations of a model that I have fitted previously (in particular, a FixedNoiseGP). However, I am having problems parallelizing it with multiprocessing.

I have tried to use multiprocessing.Pool(), but then I get an error with pickling certain torch functions:
Can't pickle <built-in function softflus>: import of module 'torch._C._nn' failed

I have also tried with multiprocessing.Process(), but it hangs up in the forward evaluation of the FixedNoiseGP. Interestingly, if I write a print command inside the forward method of FixedNoiseGP, I can see that the MultivariateNormal is indeed being evaluated, but for some reason not passed to the process.

Any idea of how to solve this? Or other options to use a botorch model inside a parallel framework?

[Bug] Lapack Error when evaluating new experiment trial

๐Ÿ› Bug

Try to implement example form here https://botorch.org/tutorials/custom_botorch_model_in_ax

But replace branin function with LGBM model fit
Search space:

from ax import ParameterType, RangeParameter, SearchSpace, ChoiceParameter
search_space = SearchSpace(
    parameters=[
        RangeParameter(
            name="bagging_freq", parameter_type=ParameterType.INT, lower=1, upper=10, 
        ),
        RangeParameter(
            name="learning_rate", parameter_type=ParameterType.FLOAT, lower=10e-5, upper=10e-1
        ),
        RangeParameter(
            name="min_data_in_leaf", parameter_type=ParameterType.INT, lower=10, upper=1000
        ),
        RangeParameter(
            name="min_sum_hessian_in_leaf", parameter_type=ParameterType.INT, lower=10, upper=1000
        ),
        RangeParameter(
            name="num_leaves", parameter_type=ParameterType.INT, lower=2, upper=128
        ), 
        RangeParameter(
            name="max_depth", parameter_type=ParameterType.INT, lower=1, upper=20
        ),
        RangeParameter(
            name="num_treas", parameter_type=ParameterType.INT, lower=100, upper=1000
        ),
        ChoiceParameter(
            name='boost', parameter_type=ParameterType.STRING, values = ['gbdt', 'gbrt', 'goss']
        )
    ]
)

After some number of evaluating exp.new_trial got this ERROR:

Lapack Error syev : 2 off-diagonal elements didn't converge to zero at /pytorch/aten/src/TH/generic/THTensorLapack.cpp:296

It is a bug or I'm doing something wrong?

System information

Please complete the following information:

  • BoTorch Version: 0.1.1
  • GPyTorch Version: 0.3.3
  • PyTorch Version: 1.1.1
  • Computer OS : Ubuntu 19.04

Integrating over something else than priors

Hi,

I'm currently trying to modify some code related to Heteroskedastic GPs, and want to add some MCMC component to it, to essentially circumvent some instability in the convergence process.
The dataset is quite small (1d, few thousand points at most), so performance is not a huge issue, although I want to have a somewhat efficient code.

The problem is as follows:

  1. First estimate the noise through some homoskedastic GP
  2. Fit a gp to the log of the noise levels. This is ideally the distribution I want to sample from
  3. Sample N points from the gp posterior for the noise at each location x, this results in a N * n_batch * 1 sample.
  4. forward this to my heteroskedastic model who accepts a diagonal variance filled by the noise levels from my gp posterior
  5. Obtain a joint distribution P( Y | Z, X) * P(Z|X) that I would like to integrate w.r.t Z by sampling.

However, I am relatively new with this, and I can't quite yet grasp what the gpytorch.distributions.MultivariateNormal allows me to do.

I know I can get log_prob which doesn't quite help me, unless some trick, or that pyro could naturally help me if I was currently sampling from a prior over some hyperparameter.

One solution would be to simply set this noise as a prior and let pyro do its work, but not sure how to do this yet, as it would depend conditionally on the inputs.

Anyway, to give an idea of what I'm currenly doing, you can find some sample code below!

Sample code

Here is where essentially I'd like to introduce sampling and my integration:

class myHeteroGP(SingleTaskGP):
    def __init__(self, train_x, train_y, train_y_var, internal_samples=200):
        # model given the noise.
        mean_module = gpytorch.means.ConstantMean()
        covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
        noise_gp_covar = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())


        validate_input_scaling(train_X=train_x, train_Y=train_x, train_Yvar=train_y_var)
        #self._validate_tensor_args(X=train_x, Y=train_y, Yvar=train_y_var)
        #self._set_dimensions(train_X=train_x, train_Y=train_y)

        noise_likelihood = GaussianLikelihood(
            noise_prior=SmoothedBoxPrior(-3, 5, 0.5, transform=torch.log),
            #batch_shape=self._aug_batch_shape,
            noise_constraint=GreaterThan(
                MIN_INFERRED_NOISE_LEVEL, transform=None, initial_value=1.0
            ))

        noise_model = SingleTaskGP(
            train_X=train_x,
            train_Y=(train_y_var+1e-06).log(),
            likelihood=noise_likelihood,
            covar_module=noise_gp_covar
            )

        likelihood =  _GaussianLikelihoodBase(HeteroskedasticNoise(noise_model))
        super().__init__(train_X=train_x,
                         train_Y=train_y,
                         likelihood=likelihood,
                         covar_module=covar_module)


        self.to(train_x)

        self.itg_sampler = SobolQMCNormalSampler(internal_samples)
        #print(self.likelihood, self.model)

        self.noise_likelihood = noise_likelihood
        self.noise_model = noise_model


    def forward(self,x):
        # fit noise gp
        noise_mll = gpytorch.mlls.ExactMarginalLogLikelihood(self.noise_likelihood,
                                                    self.noise_model)
        print('fitting noise gp on log(y_var) as function of x')
        botorch.fit.fit_gpytorch_model(noise_mll)

        z_post = self.noise_model.posterior(x)
        z_samples = self.itg_sampler(z_post)
        #samples are `n_samples` x ` n_batch` x `n_dims`
        print(z_samples.shape)

        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        f_samples = MultivariateNormal(mean_x, covar_x)

        res = self.likelihood(f_samples, noise=z_samples.exp())
        # have some way to integrate over the z variable using the samples,
        # while still keeping the variables attached to the graph
        # to allow a fit of the noise hypermaraters

I essentially just want to know if this idea of incorporating some sampling into the main forward loop is a good idea, or whether I should just resort to different solutions altogether !

Convergence issues when fitting simple GPs with scipy

Issue description

I'm running into frequent issues with convergence when using the scipy optimizer, even when working on simple problems. Any advice or possible improvements to the library that could make things more robust?

I'm usually able to get things working by catching the error from fit_gpytorch_scipy and falling back to fit_gpytorch_torch. Is that the suggested way of handling things?

Code example

Note: Training data is from the Branin function.

import torch
from botorch.models import SingleTaskGP
from botorch.fit import fit_gpytorch_model
from gpytorch.mlls import ExactMarginalLogLikelihood


#%%
train_X = torch.Tensor(
    [
        [-8.6553e-01, 1.4892e01],
        [-3.1153e00, 7.7120e-01],
        [2.5440e00, 1.0344e00],
        [2.2272e00, 8.8403e00],
        [-3.9356e00, 7.2259e00],
        [-3.6305e00, 9.8372e00],
        [-4.7356e00, 1.0007e01],
        [-3.6515e00, 1.2925e00],
        [-4.1501e00, 1.3839e01],
        [9.8549e00, 1.3155e00],
        [-4.1514e00, 1.0000e01],
        [-3.6748e00, 1.0351e01],
        [-4.3676e00, 1.0560e01],
        [-2.9590e00, 1.0114e01],
        [-2.8987e00, 1.0865e01],
        [-3.7215e00, 1.1382e01],
        [-3.8220e00, 1.2646e01],
        [-2.5949e00, 1.2207e01],
        [-2.4585e00, 1.3816e01],
        [-9.0225e-01, 1.3051e01],
        [-7.6336e-01, 1.0695e01],
        [0.0000e00, 1.1832e01],
        [-3.1055e00, 1.5000e01],
        [-5.0000e00, 1.5000e01],
        [-3.5418e-17, 1.0000e01],
        [-5.0000e00, 1.2259e01],
    ]
)

train_Y = torch.Tensor(
    [
        1.2412e-01,
        2.0129e-01,
        2.7141e-01,
        1.2959e-01,
        2.7173e-01,
        2.5481e-01,
        1.9756e-01,
        2.2516e-01,
        1.3399e-01,
        -4.8939e00,
        2.3243e-01,
        2.4574e-01,
        2.0955e-01,
        2.6262e-01,
        2.5288e-01,
        2.2504e-01,
        1.9062e-01,
        2.3082e-01,
        1.9413e-01,
        1.7714e-01,
        2.1897e-01,
        1.7245e-01,
        1.5141e-01,
        8.9431e-04,
        2.0910e-01,
        1.1043e-01,
    ]
)


#%%
gp = SingleTaskGP(train_X, train_Y)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)

#%%

With this input, fitting the model fails with the stack trace

~/.pyenv/versions/botorch/lib/python3.7/site-packages/botorch/fit.py in fit_gpytorch_model(mll, optimizer, **kwargs)
     33     """
     34     mll.train()
---> 35     mll, _ = optimizer(mll, track_iterations=False, **kwargs)
     36     mll.eval()
     37     return mll

~/.pyenv/versions/botorch/lib/python3.7/site-packages/botorch/optim/fit.py in fit_gpytorch_scipy(mll, bounds, method, options, track_iterations)
    186         jac=True,
    187         options=options,
--> 188         callback=cb,
    189     )
    190     iterations = []

~/.pyenv/versions/botorch/lib/python3.7/site-packages/scipy/optimize/_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    599     elif meth == 'l-bfgs-b':
    600         return _minimize_lbfgsb(fun, x0, args, jac, bounds,
--> 601                                 callback=callback, **options)
    602     elif meth == 'tnc':
    603         return _minimize_tnc(fun, x0, args, jac, bounds, callback=callback,

~/.pyenv/versions/botorch/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, **unknown_options)
    333             # until the completion of the current minimization iteration.
    334             # Overwrite f and g:
--> 335             f, g = func_and_grad(x)
    336         elif task_str.startswith(b'NEW_X'):
    337             # new iteration

~/.pyenv/versions/botorch/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py in func_and_grad(x)
    283     else:
    284         def func_and_grad(x):
--> 285             f = fun(x, *args)
    286             g = jac(x, *args)
    287             return f, g

~/.pyenv/versions/botorch/lib/python3.7/site-packages/scipy/optimize/optimize.py in function_wrapper(*wrapper_args)
    298     def function_wrapper(*wrapper_args):
    299         ncalls[0] += 1
--> 300         return function(*(wrapper_args + args))
    301 
    302     return ncalls, function_wrapper

~/.pyenv/versions/botorch/lib/python3.7/site-packages/scipy/optimize/optimize.py in __call__(self, x, *args)
     61     def __call__(self, x, *args):
     62         self.x = numpy.asarray(x).copy()
---> 63         fg = self.fun(x, *args)
     64         self.jac = fg[1]
     65         return fg[0]

~/.pyenv/versions/botorch/lib/python3.7/site-packages/botorch/optim/fit.py in _scipy_objective_and_grad(x, mll, property_dict)
    221     output = mll.model(*train_inputs)
    222     args = [output, train_targets] + _get_extra_mll_args(mll)
--> 223     loss = -mll(*args).sum()
    224     loss.backward()
    225     param_dict = OrderedDict(mll.named_parameters())

~/.pyenv/versions/botorch/lib/python3.7/site-packages/gpytorch/module.py in __call__(self, *inputs, **kwargs)
     20 
     21     def __call__(self, *inputs, **kwargs):
---> 22         outputs = self.forward(*inputs, **kwargs)
     23         if isinstance(outputs, list):
     24             return [_validate_module_outputs(output) for output in outputs]

~/.pyenv/versions/botorch/lib/python3.7/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py in forward(self, output, target, *params)
     26         # Get the log prob of the marginal distribution
     27         output = self.likelihood(output, *params)
---> 28         res = output.log_prob(target)
     29 
     30         # Add terms for SGPR / when inducing points are learned

~/.pyenv/versions/botorch/lib/python3.7/site-packages/gpytorch/distributions/multivariate_normal.py in log_prob(self, value)
    127 
    128         # Get log determininat and first part of quadratic form
--> 129         inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True)
    130 
    131         res = -0.5 * sum([inv_quad, logdet, diff.size(-1) * math.log(2 * math.pi)])

~/.pyenv/versions/botorch/lib/python3.7/site-packages/gpytorch/lazy/lazy_tensor.py in inv_quad_logdet(self, inv_quad_rhs, logdet, reduce_inv_quad)
    990             from .chol_lazy_tensor import CholLazyTensor
    991 
--> 992             cholesky = CholLazyTensor(self.cholesky())
    993             return cholesky.inv_quad_logdet(inv_quad_rhs=inv_quad_rhs, logdet=logdet, reduce_inv_quad=reduce_inv_quad)
    994 

~/.pyenv/versions/botorch/lib/python3.7/site-packages/gpytorch/lazy/lazy_tensor.py in cholesky(self, upper)
    716             (LazyTensor) Cholesky factor (lower triangular)
    717         """
--> 718         res = self._cholesky()
    719         if upper:
    720             res = res.transpose(-1, -2)

~/.pyenv/versions/botorch/lib/python3.7/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs)
     32         cache_name = name if name is not None else method
     33         if not is_in_cache(self, cache_name):
---> 34             add_to_cache(self, cache_name, method(self, *args, **kwargs))
     35         return get_from_cache(self, cache_name)
     36 

~/.pyenv/versions/botorch/lib/python3.7/site-packages/gpytorch/lazy/lazy_tensor.py in _cholesky(self)
    401             evaluated_mat.register_hook(_ensure_symmetric_grad)
    402 
--> 403         cholesky = psd_safe_cholesky(evaluated_mat.double()).to(self.dtype)
    404         return NonLazyTensor(cholesky)
    405 

~/.pyenv/versions/botorch/lib/python3.7/site-packages/gpytorch/utils/cholesky.py in psd_safe_cholesky(A, upper, out, jitter)
     45                 continue
     46 
---> 47         raise e
     48 
     49 

~/.pyenv/versions/botorch/lib/python3.7/site-packages/gpytorch/utils/cholesky.py in psd_safe_cholesky(A, upper, out, jitter)
     19     """
     20     try:
---> 21         L = torch.cholesky(A, upper=upper, out=out)
     22         # TODO: Remove once fixed in pytorch (#16780)
     23         if A.dim() > 2 and A.is_cuda:

RuntimeError: cholesky_cpu: U(1,1) is zero, singular U.

Digging a little bit with the debugger, I discovered that the problem is not actually that the matrix is singular, but that there are NaNs in the input matrix. Taking a look at the GP parameters after catching the error shows that the optimizer has moved into a region where the lengthscale parameters are large negative values:

print(list(gp.named_parameters()))

[('likelihood.noise_covar.raw_noise', Parameter containing:
  tensor([0.2566], requires_grad=True)),
 ('mean_module.constant', Parameter containing:
  tensor([72.2831], requires_grad=True)),
 ('covar_module.raw_outputscale', Parameter containing:
  tensor(-350.1570, requires_grad=True)),
 ('covar_module.base_kernel.raw_lengthscale', Parameter containing:
  tensor([[-391.1742, -267.7276]], requires_grad=True))]

My best guess as to what's happening is that gpytorch is applying its Positive constraint to the lengthscale by computing softplus(-391.1742), which is, to machine precision, 0. This causes a divide by zero in the Matรฉrn kernel, which leads to the aforementioned NaNs.

System Info

Please provide information about your setup, including

  • BoTorch Version: 0.1.0
  • GPyTorch Version 0.3.2
  • PyTorch Version 1.1.0
  • OS X Mojave (10.14.3)

Add decorator for pending point handling

Currently, MC acquisition functions all have the following piece of code at the beginning of their forward method:

if self.X_pending is not None:
    X = torch.cat([X, match_batch_shape(self.X_pending, X)], dim=-2)`

We should move this into a @concatenate_pending_points operator that does this automatically, so pending points can be supported in a more lightweight fashion in new acquisition functions, requiring less code in forward.

[Bug] Multi-objective Optimization with qUCB

I tried to use Botorch on multi(two)-objective optimization. I used the custom acquisition function from here. It's working well with joint_optimize but not withsequential_optimize. You can find both here. And this is the error I got. I'm sorry I couldn't share my code here.

** Stack trace/error message **

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/botorch/optim/optimize.py in sequential_optimize(acq_function, bounds, q, num_restarts, raw_samples, options, inequality_constraints, equality_constraints, fixed_features, post_processing_func)
     79             inequality_constraints=inequality_constraints,
     80             equality_constraints=equality_constraints,
---> 81             fixed_features=fixed_features,
     82         )
     83         if post_processing_func is not None:

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/botorch/optim/optimize.py in joint_optimize(acq_function, bounds, q, num_restarts, raw_samples, options, inequality_constraints, equality_constraints, fixed_features, post_processing_func)
    150         num_restarts=num_restarts,
    151         raw_samples=raw_samples,
--> 152         options=options,
    153     )
    154     batch_limit = options.get("batch_limit", num_restarts)

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/botorch/optim/optimize.py in gen_batch_initial_conditions(acq_function, bounds, q, num_restarts, raw_samples, options)
    247                 while start_idx < X_rnd.shape[0]:
    248                     end_idx = min(start_idx + batch_limit, X_rnd.shape[0])
--> 249                     Y_rnd_curr = acq_function(X_rnd[start_idx:end_idx])
    250                     Y_rnd_list.append(Y_rnd_curr)
    251                     start_idx += batch_limit

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/botorch/utils/transforms.py in decorated(cls, X)
    126                 )
    127             X = X if X.dim() > 2 else X.unsqueeze(0)
--> 128             return method(cls, X)
    129 
    130         return decorated

~/Desktop/code/bo-benchmarks/mc_acqf.py in forward(self, X)
     46         # o: output dimensions
     47         posterior = self.model.posterior(X)
---> 48         samples = self.sampler(posterior)  # n x b x q x o
     49         scalarized_samples = samples.matmul(self.weights)  # n x b x q
     50         mean = posterior.mean  # b x q x o

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    491             result = self._slow_forward(*input, **kwargs)
    492         else:
--> 493             result = self.forward(*input, **kwargs)
    494         for hook in self._forward_hooks.values():
    495             hook_result = hook(self, input, result)

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/botorch/sampling/samplers.py in forward(self, posterior)
     53         self._construct_base_samples(posterior=posterior, shape=base_sample_shape)
     54         samples = posterior.rsample(
---> 55             sample_shape=self.sample_shape, base_samples=self.base_samples
     56         )
     57         return samples

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/botorch/posteriors/gpytorch.py in rsample(self, sample_shape, base_samples)
     78         with gpytorch.settings.fast_computations(covar_root_decomposition=False):
     79             samples = self.mvn.rsample(
---> 80                 sample_shape=sample_shape, base_samples=base_samples
     81             )
     82         # make sure there always is an output dimension

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/gpytorch/distributions/multitask_multivariate_normal.py in rsample(self, sample_shape, base_samples)
    113             base_samples = base_samples.view(*sample_shape, *self.loc.shape)
    114 
--> 115         samples = super().rsample(sample_shape=sample_shape, base_samples=base_samples)
    116         if not self._interleaved:
    117             # flip shape of last two dimensions

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/gpytorch/distributions/multivariate_normal.py in rsample(self, sample_shape, base_samples)
    158 
    159             # Now reparameterize those base samples
--> 160             covar_root = covar.root_decomposition().root
    161             # If necessary, adjust base_samples for rank of root decomposition
    162             if covar_root.shape[-1] < base_samples.shape[-2]:

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs)
     32         cache_name = name if name is not None else method
     33         if not is_in_cache(self, cache_name):
---> 34             add_to_cache(self, cache_name, method(self, *args, **kwargs))
     35         return get_from_cache(self, cache_name)
     36 

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in root_decomposition(self)
   1320                 )
   1321 
-> 1322         res = self._root_decomposition()
   1323         return RootLazyTensor(res)
   1324 

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/gpytorch/lazy/block_diag_lazy_tensor.py in _root_decomposition(self)
     65 
     66     def _root_decomposition(self):
---> 67         return self.__class__(self.base_lazy_tensor._root_decomposition())
     68 
     69     def _root_inv_decomposition(self, initial_vectors=None):

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in _root_decomposition(self)
    577             batch_shape=self.batch_shape,
    578             matrix_shape=self.matrix_shape,
--> 579         )(*self.representation())
    580         return res
    581 

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/gpytorch/functions/_root_decomposition.py in forward(self, *matrix_args)
     51             matrix_shape=self.matrix_shape,
     52             batch_shape=self.batch_shape,
---> 53             init_vecs=self.initial_vectors,
     54         )
     55 

~/.local/share/virtualenvs/bo-benchmarks-uT2BAiPg/lib/python3.6/site-packages/gpytorch/utils/lanczos.py in lanczos_tridiag(matmul_closure, max_iter, dtype, device, matrix_shape, batch_shape, init_vecs, num_init_vecs, tol)
     80     # Copy over alpha_0 and beta_0 to t_mat
     81     t_mat[0, 0].copy_(alpha_0)
---> 82     t_mat[0, 1].copy_(beta_0)
     83     t_mat[1, 0].copy_(beta_0)
     84 

IndexError: index 1 is out of bounds for dimension 1 with size 1

System information

Please complete the following information:

  • botorch==0.1.2
  • gpytorch==0.3.4
  • torch==1.1.0
  • MacOS 10.14.5

Inefficiency in generating initial conditions while using optimize_acqf with fixed_features

Issue description

While using optimize_acqf with fixed_features, the initial condition generation does not account for the fixed_features. The fixed_features are later applied during optimization with gen_candidates_scipy. I believe this could be improved by using fixed_features during initial condition generation to improve the quality of the initial conditions.

Code example

Following code can be used in debug mode along with a breakpoint at the forward pass of qKnowledgeGradient to observe the behavior. The first call of forward will not have X[:, :, 0] = 0.5 fixed (this is the initial condition generation), later calls will have it fixed.

import torch
from botorch.acquisition import qKnowledgeGradient
from botorch.fit import fit_gpytorch_model
from botorch.models import SingleTaskGP
from botorch.models.transforms import Standardize
from botorch.optim import optimize_acqf
from gpytorch.mlls import ExactMarginalLogLikelihood
from torch import Tensor

train_X = torch.rand((10, 2))
train_Y = torch.sum(train_X, dim=-1, keepdim=True) + torch.randn(10, 1) * 0.1

gp = SingleTaskGP(train_X, train_Y, outcome_transform=Standardize(m=1))
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)

bounds = Tensor([[0], [1]]).repeat(1, 2)
acqf = qKnowledgeGradient(gp)
fixed_features = {0: 0.5}

solution, value = optimize_acqf(acqf, bounds=bounds, q=1, num_restarts=10, raw_samples=100, fixed_features=fixed_features)

print("solution = %s" % solution)

System Info

Please provide information about your setup, including

  • BoTorch Version: up to date with master
  • GPyTorch Version: up to date with master
  • PyTorch Version: 1.3.1
  • Computer OS: macOS Catalina 10.15.2
  • Python: 3.7.5 on Anaconda

ImportError: cannot import name 'optimize_acqf'

๐Ÿ› Bug

Tried adapting https://github.com/pytorch/botorch/blob/master/tutorials/closed_loop_botorch_only.ipynb and got this error
from botorch.optim import optimize_acqf

BATCH_SIZE = 3
bounds = torch.tensor([[0.0] * 6, [1.0] * 6], device=device, dtype=dtype)

To reproduce

from botorch.optim import optimize_acqf fails but from botorch.optim import joint_optimize works
** Code snippet to reproduce **

# Your code goes here
# Please make sure it does not require any external dependencies

** Stack trace/error message **

// Paste the bad output here!

Expected Behavior

System information

Please complete the following information:

Additional context

Add any other context about the problem here.

gen_candidates_torch evaluates the acquisition function twice per iteration

In botorch/gen.py, gen_candidates_torch evaluates loss / the acquisition function twice per iteration: Once at line 206 for reporting / book keeping purposes, and once at line 218 for the optimizer iteration.

In line 218, bayes_optimizer.step(closure) has a return value of loss which can be used for reporting / book keeping purposes without the need to evaluate it again at line 206. This would half the number of acquisition function evaluations needed.

[Bug] autograd of posterior mean w.r.t. x seems unstable

๐Ÿ› Bug

trying to compute the norm of gradient of posterior mean, but find different runs of autograd give different results.
Not sure if this is a general problem of autograd or specific to the model.posterior(x).mean function.
Note the eigenvalues of the kernel matrix show that it's not ill-conditioned.

To reproduce

** Code snippet to reproduce **

import torch
from torch import Tensor
import pickle
# from botorch.models import SingleTaskGP
from typing import Optional

import torch
from gpytorch.constraints.constraints import GreaterThan
from gpytorch.distributions.multivariate_normal import MultivariateNormal
from gpytorch.kernels.matern_kernel import MaternKernel
from gpytorch.kernels.scale_kernel import ScaleKernel
from gpytorch.likelihoods.gaussian_likelihood import (
    FixedNoiseGaussianLikelihood,
    GaussianLikelihood,
    _GaussianLikelihoodBase,
)
from gpytorch.likelihoods.likelihood import Likelihood
from gpytorch.likelihoods.noise_models import HeteroskedasticNoise
from gpytorch.means.constant_mean import ConstantMean
from gpytorch.models.exact_gp import ExactGP
from gpytorch.priors.smoothed_box_prior import SmoothedBoxPrior
from gpytorch.priors.torch_priors import GammaPrior
from torch import Tensor

from botorch.models.gpytorch import BatchedMultiOutputGPyTorchModel

MIN_INFERRED_NOISE_LEVEL = 1e-6
class SingleTaskGP(ExactGP, BatchedMultiOutputGPyTorchModel):
    r"""A single-task Exact GP model.

    A single-task exact GP using relatively strong priors on the Kernel
    hyperparameters, which work best when covariates are normalized to the unit
    cube and outcomes are standardized (zero mean, unit variance).

    This model works in batch mode (each batch having its own hyperparameters).
    When the training observations include multiple outputs, this model will use
    batching to model outputs independently.

    Use this model when you have independent output(s) and all outputs use the same
    training data. If outputs are independent and outputs have different training
    data, use the ModelListGP. When modeling correlations between outputs, use
    the MultiTaskGP.
    """

    def __init__(
        self, train_X: Tensor, train_Y: Tensor, likelihood: Optional[Likelihood] = None
    ) -> None:
        r"""A single-task Exact GP model.

        Args:
            train_X: A `n x d` or `batch_shape x n x d` (batch mode) tensor of training
                features.
            train_Y: A `n x (o)` or `batch_shape x n x (o)` (batch mode) tensor of
                training observations.
            likelihood: A likelihood. If omitted, use a standard
                GaussianLikelihood with inferred noise level.

        Example:
            >>> train_X = torch.rand(20, 2)
            >>> train_Y = torch.sin(train_X[:, 0]]) + torch.cos(train_X[:, 1])
            >>> model = SingleTaskGP(train_X, train_Y)
        """
        ard_num_dims = train_X.shape[-1]
        self._set_dimensions(train_X=train_X, train_Y=train_Y)
        train_X, train_Y, _ = multioutput_to_batch_mode_transform(
            train_X=train_X, train_Y=train_Y, num_outputs=self._num_outputs
        )
        if likelihood is None:
            noise_prior = GammaPrior(1.1, 0.05)
            noise_prior_mode = (noise_prior.concentration - 1) / noise_prior.rate
            likelihood = GaussianLikelihood(
                noise_prior=noise_prior,
                batch_shape=self._aug_batch_shape,
                noise_constraint=GreaterThan(
                    MIN_INFERRED_NOISE_LEVEL,
                    transform=None,
                    initial_value=noise_prior_mode,
                ),
            )
        else:
            self._likelihood_state_dict = deepcopy(likelihood.state_dict())
        super().__init__(train_X, train_Y, likelihood)
        self.mean_module = ConstantMean(batch_shape=self._aug_batch_shape)
        self.covar_module = ScaleKernel(
            MaternKernel(
                nu=2.5,
                ard_num_dims=ard_num_dims,
                batch_shape=self._aug_batch_shape,
                lengthscale_prior=GammaPrior(3.0, 6.0),
            ),
            batch_shape=self._aug_batch_shape,
            outputscale_prior=GammaPrior(2.0, 1),
        )
        self.to(train_X)

    def forward(self, x: Tensor) -> MultivariateNormal:
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return MultivariateNormal(mean_x, covar_x)

with open("../debug_autograd.dat", "rb") as f:  # unzip the attached .zip file 
    data = pickle.load(f)
model_state = data["model_state"]
X = data["X"][0]
Y = data["Y"]
x = data["x"]
# print(x, x.shape)
model = SingleTaskGP(X, Y)
model.load_state_dict(model_state)
posterior_mean = model.posterior(x).mean
# print(posterior_mean)
num_repeats = 20
grads = [0] * num_repeats

for i in range(num_repeats):
    grads[i] = torch.autograd.grad(
        posterior_mean.squeeze(),
        x,
        grad_outputs=torch.Tensor([1.0] * len(posterior_mean)),
        create_graph=True,
    )[0]
    print(f"norm: {grads[i].norm().detach():<17}, grad: {grads[i].detach().numpy()}")


# see if the kernel matrix is ill-conditioned
K = model.covar_module(X,X).evaluate().detach().numpy()
print("eigenvalues of kernel matrix:\n", np.linalg.eig(K)[0])

** Stack trace/error message **

norm: 3.165595293045044, grad: [[ 0.31403214 -0.4016987   3.1242533  -0.0075816 ]]
norm: inf              , grad: [[-5.7399116e+23  7.5462431e+23  5.5454977e+23 -3.3287596e+23]]
norm: 3.165595293045044, grad: [[ 0.31403214 -0.4016987   3.1242533  -0.0075816 ]]
norm: 3.165595293045044, grad: [[ 0.31403214 -0.4016987   3.1242533  -0.0075816 ]]
norm: 3.165595293045044, grad: [[ 0.31403214 -0.4016987   3.1242533  -0.0075816 ]]
norm: inf              , grad: [[-5.7399116e+23  7.5462431e+23  5.5454977e+23 -3.3287596e+23]]
norm: inf              , grad: [[-3.7144462e+22  4.7384191e+22 -2.0683729e+23 -2.0664616e+22]]
norm: inf              , grad: [[-5.5315715e+22  6.1239115e+22  1.7078501e+22 -2.9080216e+21]]
norm: inf              , grad: [[-6.7333276e+20  7.3049702e+20  7.0442505e+20  4.2533686e+18]]
norm: 3.165595293045044, grad: [[ 0.31403214 -0.4016987   3.1242533  -0.0075816 ]]
norm: 3.165595293045044, grad: [[ 0.31403214 -0.4016987   3.1242533  -0.0075816 ]]
norm: 3.165595293045044, grad: [[ 0.31403214 -0.4016987   3.1242533  -0.0075816 ]]
norm: 13.015168190002441, grad: [[ 1.0258255  -3.4464521  12.508536    0.02787224]]
norm: inf              , grad: [[-1.4869729e+22  1.1159501e+22 -2.7945394e+22  9.8531644e+19]]
norm: inf              , grad: [[-3.4760918e+21 -1.2014268e+21 -3.9865152e+22  2.6559295e+19]]
norm: 3.165595293045044, grad: [[ 0.31403214 -0.4016987   3.1242533  -0.0075816 ]]
norm: inf              , grad: [[-3.3196202e+23  9.3569012e+23  3.5546688e+23 -3.9142683e+23]]
norm: inf              , grad: [[-3.4760918e+21 -1.2014268e+21 -3.9865152e+22  2.6559295e+19]]
norm: inf              , grad: [[-3.6437675e+20  3.4892328e+20 -2.0767334e+21 -1.3027703e+20]]
norm: 3.165595293045044, grad: [[ 0.31403214 -0.4016987   3.1242533  -0.0075816 ]]

eigenvalues of kernel matrix:
 [2.4965413  1.4645283  1.1193455  1.0887814  0.24315852 0.42903015
 0.40263683 0.7568799  0.62619835 0.6405027  0.7195643 ]

System information

Please complete the following information:

  • BoTorch: 0.1.3
  • GPyTorch: 0.3.5
  • PyTorch: 1.2.0
  • OS: Ubuntu 18.04.3 LTS (Bionic Beaver)

debug_autograd.zip

[Feature Request] Make gen_batch_initial_conditions and optimize_acqf work with data that has more than 1111 dimensions

๐Ÿš€ Feature Request

Being able to use gen_batch_initial_conditions and optimize_acqf with data that has more than 1111 dimensions.

Motivation

Currently you will get an error like this

gpapp_1  |   File "/app/fulloptim.py", line 96, in get_candidate
gpapp_1  |     acq_f, bounds=bounds, q=2, num_restarts=5, raw_samples=5,
gpapp_1  |   File "/opt/conda/lib/python3.6/site-packages/botorch/optim/optimize.py", line 136, in optimize_acqf
gpapp_1  |     options=options,
gpapp_1  |   File "/opt/conda/lib/python3.6/site-packages/botorch/optim/optimize.py", line 308, in gen_batch_initial_conditions
gpapp_1  |     seed=seed,
gpapp_1  |   File "/opt/conda/lib/python3.6/site-packages/botorch/utils/sampling.py", line 143, in draw_sobol_samples
gpapp_1  |     sobol_engine = SobolEngine(d, scramble=True, seed=seed)
gpapp_1  |   File "/opt/conda/lib/python3.6/site-packages/torch/quasirandom.py", line 47, in __init__
gpapp_1  |     "for SobolEngine is [1, {}]".format(self.MAXDIM))
gpapp_1  | ValueError: Supported range of dimensionality for SobolEngine is [1, 1111]

The only went to circumvent it is to build your initial conditions and pass them to optimize_acqf.

Pitch

Instead of an error, initial conditions could be created randomly (within the bounds) with something other than draw_sobol_samples when the dimensionality of the data is more than 1111 (sobol's limit)

[Bug] IndexError: Dimension out of range (expected to be in range of [-1, 0], but got -2)

๐Ÿ› Bug

After following conda installation instructions, tried to run in jupyter notebook the example "2. Fit a model:" from the botorch.org frontpage. Got the following error.

To reproduce

** Code snippet to reproduce **

import torch
from botorch.models import SingleTaskGP
from botorch.fit import fit_gpytorch_model
from gpytorch.mlls import ExactMarginalLogLikelihood

train_X = torch.rand(10, 2)
Y = 1 - torch.norm(train_X - 0.5, dim=-1) + 0.1 * torch.rand(10)
train_Y = (Y - Y.mean()) / Y.std()

gp = SingleTaskGP(train_X, train_Y)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)

** Stack trace/error message **

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-fd0caaa8f8fc> in <module>
      8 train_Y = (Y - Y.mean()) / Y.std()
      9 
---> 10 gp = SingleTaskGP(train_X, train_Y)
     11 mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
     12 fit_gpytorch_model(mll)

~/miniconda3/envs/j/lib/python3.7/site-packages/botorch/models/gp_regression.py in __init__(self, train_X, train_Y, likelihood, covar_module)
     77             >>> model = SingleTaskGP(train_X, train_Y)
     78         """
---> 79         validate_input_scaling(train_X=train_X, train_Y=train_Y)
     80         self._validate_tensor_args(X=train_X, Y=train_Y)
     81         self._set_dimensions(train_X=train_X, train_Y=train_Y)

~/miniconda3/envs/j/lib/python3.7/site-packages/botorch/models/utils.py in validate_input_scaling(train_X, train_Y, train_Yvar, raise_on_fail)
    219             raise InputDataError("Input data contains negative variances.")
    220     check_min_max_scaling(X=train_X, raise_on_fail=raise_on_fail)
--> 221     check_standardization(Y=train_Y, raise_on_fail=raise_on_fail)

~/miniconda3/envs/j/lib/python3.7/site-packages/botorch/models/utils.py in check_standardization(Y, atol_mean, atol_std, raise_on_fail)
    172     """
    173     with torch.no_grad():
--> 174         Ymean, Ystd = torch.mean(Y, dim=-2), torch.std(Y, dim=-2)
    175         if torch.abs(Ymean).max() > atol_mean or torch.abs(Ystd - 1).max() > atol_std:
    176             msg = (

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got -2)

Expected Behavior

The example should run

System information

Please complete the following information:

  • BoTorch Version (0.1.4)
  • GPyTorch Version (0.3.6)
  • PyTorch Version (1.3.1)
  • centos 8
  • Linux localhost.localdomain 4.18.0-80.11.2.el8_0.x86_64 #1 SMP Tue Sep 24 11:32:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • conda (4.7.12)

Additional context

(j) [user@localhost ~]$ conda install botorch -c pytorch -c gpytorch
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/user/miniconda3/envs/j

  added / updated specs:
    - botorch


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    botorch-0.1.4              |                0          96 KB  pytorch
    cffi-1.13.1                |   py37h2e261b9_0         224 KB
    cudatoolkit-10.1.243       |       h6bb024c_0       347.4 MB
    gpytorch-0.3.6             |                0         180 KB  gpytorch
    ninja-1.9.0                |   py37hfd86e86_0         1.2 MB
    pytorch-1.3.1              |py3.7_cuda10.1.243_cudnn7.6.3_0       428.0 MB  pytorch
    ------------------------------------------------------------
                                           Total:       777.1 MB

The following NEW packages will be INSTALLED:

  botorch            pytorch/noarch::botorch-0.1.4-0
  cffi               pkgs/main/linux-64::cffi-1.13.1-py37h2e261b9_0
  cudatoolkit        pkgs/main/linux-64::cudatoolkit-10.1.243-h6bb024c_0
  gpytorch           gpytorch/noarch::gpytorch-0.3.6-0
  ninja              pkgs/main/linux-64::ninja-1.9.0-py37hfd86e86_0
  pycparser          pkgs/main/linux-64::pycparser-2.19-py37_0
  pytorch            pytorch/linux-64::pytorch-1.3.1-py3.7_cuda10.1.243_cudnn7.6.3_0


Proceed ([y]/n)? y


Downloading and Extracting Packages
ninja-1.9.0          | 1.2 MB    | ####################################### | 100%
cffi-1.13.1          | 224 KB    | ####################################### | 100%
botorch-0.1.4        | 96 KB     | ####################################### | 100%
pytorch-1.3.1        | 428.0 MB  | ####################################### | 100%
cudatoolkit-10.1.243 | 347.4 MB  | ####################################### | 100%
gpytorch-0.3.6       | 180 KB    | ####################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Add any other context about the problem here.

[Bug] posterior sampling produces NaN

๐Ÿ› Bug

posterior sampling produced NaN

To reproduce

unzip the attached data, then run the pasted code below

with open('posterior_sample_get_nan_bug', 'rb') as f:
    post = pickle.load(f)
posterior = post['posterior']
samples = posterior.rsample(sample_shape=post['sample_shape'], 
                            base_samples=post['base_samples'])
print(torch.isnan(samples).sum())
print(samples[torch.isnan(samples)])
print(torch.nonzero(samples == float("inf")).shape)
print(samples[6216,   49,    0,    0])

Output:

tensor(2)
tensor([nan, nan])
torch.Size([72, 4])
tensor(inf)

The Inf samples lead to Inf qEI values, which makes the while loop in initializers.py never exit when using joint_optimize to optimize qEI function
(

while alpha_pos.sum() < n:
)

while alpha_pos.sum() < n:
    alpha = 0.1 * alpha
    alpha_pos = Y >= alpha * max_val

posterior_sample_get_nan_bug.zip

Numerical issue with cholesky decomposition (even with normalization)

Issue description

I am consistently running into numerical issues when running fit_gpytorch_model(). I am normalizing the inputs and standardizing the outputs (as described in issue 160)

Code example

fit_gpytorch_model(mll)
EI = ExpectedImprovement(gp, best_f=0.1)

## optimize acquisition function
candidates = joint_optimize(
    acq_function=EI,
    q = 1, 
    bounds = bounds,
    num_restarts=10,
    raw_samples=500,  # used for intialization heuristic
)

new_x = candidates.detach()
exact_obj = neg_eggholder(new_x)

train_x_ei = torch.cat([train_x_ei, candidates])
train_y_ei = torch.cat([train_y_ei, exact_obj])

gp = SingleTaskGP(
    normalize(train_x_ei, bounds=bounds), 
    standardize(train_y_ei)
    )
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)

Here is the error message:

RuntimeError Traceback (most recent call last)
in
2
3
----> 4 fit_gpytorch_model(mll)
5 EI = ExpectedImprovement(gp, best_f=0.1)
6

C:\Anaconda3\envs\py36_new\lib\site-packages\botorch\fit.py in fit_gpytorch_model(mll, optimizer, **kwargs)
33 """
34 mll.train()
---> 35 mll, _ = optimizer(mll, track_iterations=False, **kwargs)
36 mll.eval()
37 return mll

C:\Anaconda3\envs\py36_new\lib\site-packages\botorch\optim\fit.py in fit_gpytorch_scipy(mll, bounds, method, options, track_iterations)
186 jac=True,
187 options=options,
--> 188 callback=cb,
189 )
190 iterations = []

C:\Anaconda3\envs\py36_new\lib\site-packages\scipy\optimize_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
599 elif meth == 'l-bfgs-b':
600 return _minimize_lbfgsb(fun, x0, args, jac, bounds,
--> 601 callback=callback, **options)
602 elif meth == 'tnc':
603 return _minimize_tnc(fun, x0, args, jac, bounds, callback=callback,

C:\Anaconda3\envs\py36_new\lib\site-packages\scipy\optimize\lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, **unknown_options)
333 # until the completion of the current minimization iteration.
334 # Overwrite f and g:
--> 335 f, g = func_and_grad(x)
336 elif task_str.startswith(b'NEW_X'):
337 # new iteration

C:\Anaconda3\envs\py36_new\lib\site-packages\scipy\optimize\lbfgsb.py in func_and_grad(x)
283 else:
284 def func_and_grad(x):
--> 285 f = fun(x, *args)
286 g = jac(x, *args)
287 return f, g

C:\Anaconda3\envs\py36_new\lib\site-packages\scipy\optimize\optimize.py in function_wrapper(*wrapper_args)
298 def function_wrapper(wrapper_args):
299 ncalls[0] += 1
--> 300 return function(
(wrapper_args + args))
301
302 return ncalls, function_wrapper

C:\Anaconda3\envs\py36_new\lib\site-packages\scipy\optimize\optimize.py in call(self, x, *args)
61 def call(self, x, *args):
62 self.x = numpy.asarray(x).copy()
---> 63 fg = self.fun(x, *args)
64 self.jac = fg[1]
65 return fg[0]

C:\Anaconda3\envs\py36_new\lib\site-packages\botorch\optim\fit.py in _scipy_objective_and_grad(x, mll, property_dict)
221 output = mll.model(*train_inputs)
222 args = [output, train_targets] + _get_extra_mll_args(mll)
--> 223 loss = -mll(*args).sum()
224 loss.backward()
225 param_dict = OrderedDict(mll.named_parameters())

C:\Anaconda3\envs\py36_new\lib\site-packages\gpytorch\module.py in call(self, *inputs, **kwargs)
20
21 def call(self, *inputs, **kwargs):
---> 22 outputs = self.forward(*inputs, **kwargs)
23 if isinstance(outputs, list):
24 return [_validate_module_outputs(output) for output in outputs]

C:\Anaconda3\envs\py36_new\lib\site-packages\gpytorch\mlls\exact_marginal_log_likelihood.py in forward(self, output, target, *params)
26 # Get the log prob of the marginal distribution
27 output = self.likelihood(output, *params)
---> 28 res = output.log_prob(target)
29
30 # Add terms for SGPR / when inducing points are learned

C:\Anaconda3\envs\py36_new\lib\site-packages\gpytorch\distributions\multivariate_normal.py in log_prob(self, value)
127
128 # Get log determininat and first part of quadratic form
--> 129 inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True)
130
131 res = -0.5 * sum([inv_quad, logdet, diff.size(-1) * math.log(2 * math.pi)])

C:\Anaconda3\envs\py36_new\lib\site-packages\gpytorch\lazy\lazy_tensor.py in inv_quad_logdet(self, inv_quad_rhs, logdet, reduce_inv_quad)
990 from .chol_lazy_tensor import CholLazyTensor
991
--> 992 cholesky = CholLazyTensor(self.cholesky())
993 return cholesky.inv_quad_logdet(inv_quad_rhs=inv_quad_rhs, logdet=logdet, reduce_inv_quad=reduce_inv_quad)
994

C:\Anaconda3\envs\py36_new\lib\site-packages\gpytorch\lazy\lazy_tensor.py in cholesky(self, upper)
716 (LazyTensor) Cholesky factor (lower triangular)
717 """
--> 718 res = self._cholesky()
719 if upper:
720 res = res.transpose(-1, -2)

C:\Anaconda3\envs\py36_new\lib\site-packages\gpytorch\utils\memoize.py in g(self, *args, **kwargs)
32 cache_name = name if name is not None else method
33 if not is_in_cache(self, cache_name):
---> 34 add_to_cache(self, cache_name, method(self, *args, **kwargs))
35 return get_from_cache(self, cache_name)
36

C:\Anaconda3\envs\py36_new\lib\site-packages\gpytorch\lazy\lazy_tensor.py in _cholesky(self)
401 evaluated_mat.register_hook(_ensure_symmetric_grad)
402
--> 403 cholesky = psd_safe_cholesky(evaluated_mat.double()).to(self.dtype)
404 return NonLazyTensor(cholesky)
405

C:\Anaconda3\envs\py36_new\lib\site-packages\gpytorch\utils\cholesky.py in psd_safe_cholesky(A, upper, out, jitter)
45 continue
46
---> 47 raise e
48
49

C:\Anaconda3\envs\py36_new\lib\site-packages\gpytorch\utils\cholesky.py in psd_safe_cholesky(A, upper, out, jitter)
19 """
20 try:
---> 21 L = torch.cholesky(A, upper=upper, out=out)
22 # TODO: Remove once fixed in pytorch (#16780)
23 if A.dim() > 2 and A.is_cuda:

RuntimeError: cholesky_cpu: U(2,2) is zero, singular U.

System Info

BoTorch 0.1.0
GPyTorch 0.3.2
Torch 1.1.0
Windows 10

[Bug] BotorchTensorDimensionError (Example of Get Started)

๐Ÿ› Bug

To reproduce

** Code snippet to reproduce **

# Your code goes here
# Please make sure it does not require any external dependencies

import torch
from botorch.models import SingleTaskGP
from botorch.fit import fit_gpytorch_model
from gpytorch.mlls import ExactMarginalLogLikelihood

train_X = torch.rand(10, 2)
Y = 1 - torch.norm(train_X - 0.5, dim=-1) + 0.1 * torch.rand(10)
train_Y = (Y - Y.mean()) / Y.std()

gp = SingleTaskGP(train_X, train_Y)

** Stack trace/error message **

// Paste the bad output here!

BotorchTensorDimensionError Traceback (most recent call last)
in
11 train_Y = (Y - Y.mean()) / Y.std()
12
---> 13 gp = SingleTaskGP(train_X, train_Y)
14 mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
15 fit_gpytorch_model(mll)

C:\Anaconda3\envs\BoTorch\lib\site-packages\botorch\models\gp_regression.py in init(self, train_X, train_Y, likelihood, covar_module)
76 >>> model = SingleTaskGP(train_X, train_Y)
77 """
---> 78 self._validate_tensor_args(X=train_X, Y=train_Y)
79 self._set_dimensions(train_X=train_X, train_Y=train_Y)
80 train_X, train_Y, _ = self._transform_tensor_args(X=train_X, Y=train_Y)

C:\Anaconda3\envs\BoTorch\lib\site-packages\botorch\models\gpytorch.py in _validate_tensor_args(X, Y, Yvar, strict)
73 f" {Y.dim()}."
74 )
---> 75 raise BotorchTensorDimensionError(message)
76 else:
77 warnings.warn(

BotorchTensorDimensionError: An explicit output dimension is required for targets. Expected Y with dimension: 1 (got 2).

Expected Behavior

Hope this test case run correctly.

System information

Please complete the following information:

  • BoTorch 0.1.3
  • GPyTroch 0.3.5
  • PyTorch 1.2.0
  • Computer OS: Windows 7 SP1

Additional context

Add any other context about the problem here.

[Feature Request] Add an option to specify gen_candidates method in joint_optimize

๐Ÿš€ Feature Request

Add an argument in joint_optimize to specify gen_candidates method - so you can easily use e.g. fit_gpytorch_torch instead of fit_gpytorch_scipy.

Motivation

It often makes sense to use a faster candidate generation method than the default, which is gen_candidates_scipy. I find myself needing to use gen_candidates_torch instead for bigger models (especially when dealing with higher dimensional data).

This would match other parts of the API - for example fit_gpytorch_model where you can easily pass optimizer=fit_gpytorch_torch instead of the default fit_gpytorch_scipy.

This would match other parts of the API - for example fit_gpytorch_model where you can easily pass optimizer=fit_gpytorch_torch instead of the default fit_gpytorch_scipy.

Pitch

Something like

def joint_optimize(
    acq_function: AcquisitionFunction,
    bounds: Tensor,
    q: int,
    num_restarts: int,
    raw_samples: int,
    options: Optional[Dict[str, Union[bool, float, int]]] = None,
    inequality_constraints: Optional[List[Tuple[Tensor, Tensor, float]]] = None,
    equality_constraints: Optional[List[Tuple[Tensor, Tensor, float]]] = None,
    fixed_features: Optional[Dict[int, float]] = None,
    post_processing_func: Optional[Callable[[Tensor], Tensor]] = None,
    gen_candidates: Optional[Callable[[Tensor], Tensor]] = gen_candidates_scipy
) -> Tensor:
...

If the feature is accepted, I can submit a PR for it.

Optimizing over discrete parameter domains

Can BoTorch be used over discrete parameter domains?
(if so, than this is a feature inquiry, and not a feature "request")

We have a use case of domains which are partly continuous, partly discrete, like:
[{"name": "param1", "type": "continuous", "domain": [-5, 10]}, {"name": "param2", "type": "continuous", "domain": [1, 15]}, {"name": "param3", "type": "discrete", "domain": [1, 1.5, 2, 2.5, 3, 3.5, 4]}]

The functions under "botorch/optim/optimize.py" accept an argument called "bounds", which you define as : "bounds: A 2 x d tensor of lower and upper bounds for each column of X".

These are obviously bounds for a continuous search space. Can BoTorch be used for searching over discrete spaces?

Thank you so much for the package!
Avi

ApproximateGP compatibility

๐Ÿ› Bug: gpytorch.models.ApproximateGP compatibility

To reproduce

** Code snippet to reproduce **

import gpytorch, torch, numpy as np, matplotlib.pyplot as plt, tqdm, botorch, utils
from scipy.interpolate import griddata

class Model(botorch.models.gpytorch.BatchedMultiOutputGPyTorchModel, gpytorch.models.ApproximateGP):

    def __init__(self, idim, grid_size):

        '''
        Variational GP.

        idim: input dimensions
        grid_size: number of inducing points
        '''

        # variational distribution and strategy
        vdist = gpytorch.variational.CholeskyVariationalDistribution(grid_size)
        vstra = gpytorch.variational.VariationalStrategy(
            self,
            torch.rand(grid_size, idim),
            vdist,
            learn_inducing_locations=True
        )

        # become an approximate GP
        super().__init__(vstra)
        
        # mean
        self.mean = gpytorch.means.ConstantMean()

        # covariance (kernel)
        self.cov = gpytorch.kernels.MaternKernel(ard_num_dims=2)
        self.cov = gpytorch.kernels.ScaleKernel(self.cov)

        # likelihood
        self.likelihood = gpytorch.likelihoods.GaussianLikelihood()

        # loss record
        self.loss = list()

    def forward(self, x):

        '''
        Returns a multivariate normal distribution,
        parameterised by the mean and covariance.

        x: inputs
        '''

        m = self.mean(x)
        c = self.cov(x)
        return gpytorch.distributions.MultivariateNormal(m, c)

    def fit(self, xtrn, xstd, ytrn, lr=1e-1, epo=None, verbose=True):

        '''
        Optimises the hyperparameters of the GP
        with noisy inputs and noisy outputs.
        Accomidates noisy inputs by randomly
        sampling them at every iteration.

        xtrn: means of training data
        xstd: standard deviation of training data
        ytrn: noisy output targets
        lr: learning rate
        epo: number of training iterations
        '''

        # preprocess
        botorch.models.utils.validate_input_scaling(xtrn, ytrn)
        self._validate_tensor_args(xtrn, ytrn)
        self._set_dimensions(xtrn, ytrn)
        xtrn, ytrn, _  = self._transform_tensor_args(xtrn, ytrn)

        # let's train MFer
        self.train()
        self.likelihood.train()

        # optimisation algorithm
        opt = torch.optim.Adam(self.parameters(), lr=lr)

        # loss function
        mll = gpytorch.mlls.PredictiveLogLikelihood(
            self.likelihood, self, xtrn.size(-2)
        )

        # convergence checker
        convergence = botorch.optim.utils.ConvergenceCriterion()

        # progress bar
        pb = tqdm.tqdm(range(epo))

        # training iterations
        for i in pb:

            # zero gradients
            opt.zero_grad()

            # sample noisy inputs
            xsmp = torch.distributions.Normal(xtrn, xstd).rsample()

            # make prediction
            output = self(xsmp)

            # loss
            loss = -mll(output, ytrn)

            # compute gradient
            loss.backward()

            # optimisation step
            opt.step()

            # record loss
            self.loss.append(loss.item())

            # set progress bar description
            if verbose: pb.set_description('Loss {}'.format(loss.item()))

            # if converged
            if convergence.evaluate(loss.detach()):
                break

        # done training :)
        self.eval()
        self.likelihood.eval()

class Decider(object):

    def __init__(self, objective, model):

        # underlying objective function
        self.objective = objective

        # the underlying model
        self.model = model

        # observations
        self.xsmean = torch.empty(torch.Size([0, 2]))
        self.xsstd  = torch.empty(torch.Size([0, 1]))
        self.f = torch.empty(torch.Size([0, 1]))
        self.fmax = torch.empty(torch.Size([0, 1]))

    def decide(self, x, xstd, lr=1e-1):

        # sample the objective function
        obj = self.objective(
            torch.distributions.Normal(x, xstd).rsample()
        )

        # record sample
        self.xsmean = torch.cat((self.xsmean, x))
        self.xsstd = torch.cat((self.xsstd, xstd))
        self.f = torch.cat((self.f, obj))
        self.fmax = torch.cat((self.fmax, self.f.max().view(-1,1)))

        # fit the model to the observations
        self.model.fit(self.xsmean, self.xsstd, self.f, lr=lr, epo=1000, verbose=True)

        # initialise acquisition function
        acq = botorch.acquisition.ProbabilityOfImprovement(
            self.model,
            self.f.max()
        )

        # optimise acquisition function
        x, ximp = botorch.optim.optimize_acqf(
            acq,
            torch.tensor([[0.0, 0.0], [1.0, 1.0]]),
            1,
            2,
            500
        )

        # return proposed sample and improvement
        return x, xstd, ximp

    def decision_process(self, x, xstd, ndec=4, lr=1e-2):

        # decision making loop
        for _ in range(ndec):

            # get sample candidate and improvement
            x, xstd, ximp = self.decide(x, xstd, lr=lr)

def franke(x, noise=0.02):
    n, d = x.shape
    X, Y = x[:,0].view(n, 1), x[:,1].view(n, 1)
    term1 = .75*torch.exp(-((9*X - 2).pow(2) + (9*Y - 2).pow(2))/4)
    term2 = .75*torch.exp(-((9*X + 1).pow(2))/49 - (9*Y + 1)/10)
    term3 = .5*torch.exp(-((9*X - 7).pow(2) + (9*Y - 3).pow(2))/4)
    term4 = .2*torch.exp(-(9*X - 4).pow(2) - (9*Y - 7).pow(2))
    f = term1 + term2 + term3 - term4
    noise = torch.randn(f.size())*noise
    return f + noise

def botst():

    # initial sample nump_points=1 x input_dim=2
    x = torch.rand(1,2).uniform_(0, 1)
    # initial sample num_points=1 x 1
    xstd = torch.rand(1,1).uniform_(0, 0.01)

    # instantiate Bayesian decider
    decider = Decider(franke, Model(2, 10))

    # make decision
    decider.decision_process(x, xstd, ndec=400, lr=1e-1)

    # test points
    n = 50
    xv, yv = torch.meshgrid([torch.linspace(0, 1, n), torch.linspace(0, 1, n)])
    xtst = torch.cat((
        xv.contiguous().view(xv.numel(), 1),
        yv.contiguous().view(yv.numel(), 1)),
        dim=1
    )
    ytst = franke(xtst, noise=0)

    # plot ground truth, learnt model + sampled points, max objective
    fig, ax = plt.subplots(3)

    x, y = np.meshgrid(xtst[:,0], xtst[:,1])
    z = griddata(
        (xtst[:,0], xtst[:,1]),
        ytst.flatten(),
        (x, y),
        method='linear'
    )
    ax[0].contourf(x, y, z)
    ax[0].set_title('Ground truth')

    with torch.no_grad():
        x, y = np.meshgrid(xtst[:,0], xtst[:,1])
        res = decider.model(xtst)
        res = decider.model.likelihood(res)
        z = griddata(
            (xtst[:,0], xtst[:,1]),
            res.mean,
            (x, y),
            method='linear'
        )
        ax[1].contourf(x, y, z)
        ax[1].plot(decider.xsmean[:,0], decider.xsmean[:,1], 'kx')
        ax[1].set_title('Gaussian process')

    ax[2].plot(decider.fmax[:,0], 'k-')

    plt.tight_layout()

    plt.show()

if __name__ == '__main__':
    botst()

** Stack trace/error message **

Loss 1.4877734184265137:   0%|                                                                                                           | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "bug.py", line 256, in <module>
    botst()
  File "bug.py", line 210, in botst
    decider.decision_process(x, xstd, ndec=400, lr=1e-1)
  File "bug.py", line 186, in decision_process
    x, xstd, ximp = self.decide(x, xstd, lr=lr)
  File "bug.py", line 174, in decide
    500
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/botorch/optim/optimize.py", line 170, in optimize_acqf
    fixed_features=fixed_features,
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/botorch/gen.py", line 123, in gen_candidates_scipy
    options={k: v for k, v in options.items() if k != "method"},
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/scipy/optimize/_minimize.py", line 600, in minimize
    callback=callback, **options)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 335, in _minimize_lbfgsb
    f, g = func_and_grad(x)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 285, in func_and_grad
    f = fun(x, *args)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 326, in function_wrapper
    return function(*(wrapper_args + args))
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 64, in __call__
    fg = self.fun(x, *args)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/botorch/gen.py", line 110, in f
    loss = -acquisition_function(X_fix).sum()
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/botorch/utils/transforms.py", line 141, in decorated
    return method(cls, X)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/botorch/acquisition/analytic.py", line 231, in forward
    posterior = self._get_posterior(X=X)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/botorch/acquisition/analytic.py", line 62, in _get_posterior
    posterior = self.model.posterior(X)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/botorch/models/gpytorch.py", line 255, in posterior
    mvn = self(X)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/gpytorch/models/approximate_gp.py", line 81, in __call__
    return self.variational_strategy(inputs, prior=prior)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/gpytorch/variational/variational_strategy.py", line 165, in __call__
    return super().__call__(x, prior=prior)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/gpytorch/variational/_variational_strategy.py", line 127, in __call__
    variational_inducing_covar=variational_dist_u.lazy_covariance_matrix,
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/gpytorch/module.py", line 24, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/home/cisprague/anaconda3/lib/python3.7/site-packages/gpytorch/variational/variational_strategy.py", line 100, in forward
    interp_term = torch.triangular_solve(induc_data_covar.double(), L, upper=False)[0].to(full_inputs.dtype)
RuntimeError: The size of tensor a (2) must match the size of tensor b (500) at non-singleton dimension 0

Expected Behavior

BoTorch should be compatible with any GPyTorch model via inhereting from botorch.models.gpytorch.GPyTorchModel, but this does not seem to work in the case of a gpytorch.models.ApproximateGP with a variational strategy. It should work like other models do.
I am trying to model a mapping from n x d heteroskedastic inputs to n x m homoskedastic outputs.

System information

Please complete the following information:

  • BoTorch version: 0.1.4
  • GPyTorch version: 0.3.6
  • Torch version: 1.3.1

Additional context

Using a mobile robot, with growing position uncertainty, to build an environmental map with sensors.

Multi-objective constrain optimization for deterministic simulations

Hello! I recently came across BoTorch and became very interested. I have been working for some years now with surrogate techniques such as Kriging to optimize expensive black-box functions. In my case, expensive simulations (which take days to complete, but that I can have several running in parallel).

Brief description of my problem

  • Multi-objective constrained optimization with 2 to 6 continuous variables.
  • Possibility to run ~10 evaluations in parallel (great for multipoint acquisition functions).
  • Noise-free evaluations because they come from a deterministic simulation.
  • All objectives and constrains can be obtained from the same simulation results.

Question

I have got BoTorch running for some test cases, but I was wondering what the best way is to implement the following features within your framework:

  • What is the correct way to implement the Gaussian Processes with deterministic noise-free evaluations? Should I use a FixedGaussianNoise likelihood with zero noise?
  • Have you implemented Multiobjective acquisition functions that do not construct pseudo single-objective functions (i.e. not using weights). What I mean by this is that I'm interested in techniques that work with Pareto fronts, like acquisition functions that aim at increasing the volume of the Pareto set.
  • What is the way to optimize and acquisition function subject to nonlinear constrains? As a first cut I could design a metric that accounts for "goodness" of a solution in terms of a meeting a constrain, and then use that metric in a multi-objective or weighted-objective optimization. However, maybe you have other suggestions.

Thank you very much for the great package. Looking forward to contribute in the future.

qKnowledgeGradient doesn't support FixedNoiseGaussianLikelihood

If we use a model with FixedNoiseGaussianLikelihood, then it will raise the exception:
raise RuntimeError("FixedNoiseGaussianLikelihood.fantasize requires a 'noise' kwarg"),

This is because noise is not passed in in the qKG implementation? (

fantasy_model = self.model.fantasize(
)
fantasy_model = self.model.fantasize( X=X_actual, sampler=self.sampler, observation_noise=True )

is there an easy way to get around this?

[Question] Model uncertainty in the presence of missing data

๐Ÿš€ Feature Request

Hello,

This looks like a really cool BO project. I'm really glad a BO project is being built on Pytorch. I need some clarification around modeling uncertainty with GPytorch.

Motivation

I have had difficulty in getting models from GPytorch to accurately predict epistemic uncertainty. Generally I see 1D regression tutorials with evenly spaced toy data sets that don't give the opportunity to show whether or not the model's uncertainty grows in regions where data is lacking like I'd expect. I noticed in your example this is also the case. Is your wrapped GPytorch model using advanced settings to ensure an accurate and expressive posterior that doesn't underestimate epistemic uncertainty?

Pitch

It would be very helpful to have some discussion on expressive posteriors, especially if the model has limitations for data sets that has regions of missing data. If the model is capable of accurate epistemic uncertainty predictions it would be helpful to use an example that shows this more clearly. Ultimately, I think that models being able to tell me what they don't know is one the most important things for BO.

Thanks,
Michael Huang

[Feature Request] Implement most likely heteroskedastic GP and write a tutorial notebook

๐Ÿš€ Feature Request

I would like to submit a potential example notebook, based on my following work here:
https://colab.research.google.com/drive/1dOUHQzl3aQ8hz6QUtwRrXlQBGqZadQgG
(hopefully it is easy for others to run in Colab!)

The notebook details the use of a heteroskedastic GP in BoTorch. However, before finalizing the notebook and cleaning it up a bit, there are a few things that I believe could be changed to help improve the entire training process (also described in notebook comments). I apologize if this reads as many features requests/issues in one -- but I am happy to contribute pull requests and please let me know what you all think / if I am totally wrong about something. I can also open separate feature requests for each statement if you agree on the need.

Listed in roughly decreasing order of perceived importance.

  1. I would love to see the implementation of a mll.model.predictive_posterior function. Since the mll.model.posterior(X_test).mvn.confidence_region() function currently only gives the posterior for the function mean, this can be confusing to those who think they are getting the predictive posterior (as is shown in using confidence region with Gpytorch tutorials, for example). Currently, for simple models the predictive posterior is easy to get through mll.likelihood(mll.model(X_test)).confidence_region(). However, this does not work for the HeteroskedasticSingleTaskGP model. For this HeteroskedasticSingleTaskGP model, I could only obtain the prediction intervals using the following code:
mll.eval()
with torch.no_grad():
    posterior = mll.model.posterior(X_test)

    predictive_noise = torch.exp(mll.model.likelihood.noise_covar.noise_model.posterior(X_test).mean)
    # get standard deviation
    predictive_noise_std = torch.sqrt(predictive_noise).squeeze(-1)

   # fit posterior confidence
    lower, upper = posterior.mvn.confidence_region()
    # get posterior predictive confidence by adding noise from noise model
    lower_predictive = lower - 2*predictive_noise_std
    upper_predictive = upper + 2*predictive_noise_std

which is quite burdensome. (Note that the noise model of HeteroskedasticSingleTaskGP currently returns the log of the variance, thus the exponentiation. I believe this may be an error in the noise model.)

In summary, I believe a predictive_posterior function is often what people are after and would be a nice complement to posterior. Furthermore, it would be nice to standardize the methods of obtaining prediction intervals across tasks since the same method does not currently work for all models.

  1. I do believe there should probably be a HeteroskedasticGP class that merely accepts train_x, train_y, and does not require the input of train_yvar. This can get rather complicated, but a simple solution for now would be to initially create a SingleTaskGP that is used to estimate train_Yvar once trained. The observed variances could then be fed into the current implementation of HeteroskedasticSingleTaskGP along with train_x, train_y, as is done in the notebook. This is perhaps the more controversial of the feature requests.

Please let me know if you have any other comments on the notebook, and thanks for the library!

Training heteroscedastic GP (possible bug or inconsistency)

Hi,

I am trying to train a heteroscedastic GP on high-dimensional inputs. Because scipy.optimize.minimize cannot really optimize GPs in more than a few dimensions, I'm trying to use minibatch optimizers from torch.optim such as Adam. Following the examples on the tutorials, I've created a very simple function to train a GP with torch.optim. The optimization works perfectly for the homoscedastic SingleTaskGP, but not so for the heteroscedastic HeteroskedasticSingleTaskGP, where I obtain the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-4a5cc8aa17fb> in <module>()
      1 hetero_GP = HeteroskedasticSingleTaskGP(x,y,s)
----> 2 hetero_GP = train_a_GP(hetero_GP, x, y, 10)

9 frames
/usr/local/lib/python3.6/dist-packages/gpytorch/module.py in __call__(self, *inputs, **kwargs)
     20 
     21     def __call__(self, *inputs, **kwargs):
---> 22         outputs = self.forward(*inputs, **kwargs)
     23         if isinstance(outputs, list):
     24             return [_validate_module_outputs(output) for output in outputs]

TypeError: forward() missing 1 required positional argument: 'x'

I have put together a simple example on this Colab notebook:
https://colab.research.google.com/drive/106oIYcd4fnidAGr2q8-R64jWy0ufu1Y8

Am I doing something wrong? Any help would be greatly appreciated.

Thanks and best wishes,

Miguel

[Feature Request] Add warnings if train data is not normalized/standardized

๐Ÿš€ Feature Request

Let's emit a warning if input data to the standard models are not normalized/standardized.

Motivation

Our standard models assume that the input data is normalized (train inputs in the unit cube, train targets with zero mean and unit variance).
While we do indicate this in the tutorials/docstrings, a good amount of the issues we see reported is b/c users do not standardize the data.

discrete parameter domain of GP, high-dimensional GP, multi-output BO, output-constrainted BO

๐Ÿš€ Feature Request

Hi, I'm new to this package, and tried to find if botorch matches my situations.

What I would like to do is

  1. create surrogate model on sample pool, e.g., a high-dimensional (dim. >40) , and discrete parameter domain of GP,
  2. create acquisition function, e.g., multi-output BO,
  3. optimize acquisition function, e.g., output-constrained BO,
  4. get the next recommendation sample, say X and do REAL experiment to get REAL_Y,
  5. augment (X, REAL_Y) to sample pool, and
  6. repeat steps 1-5 till some criterion is satisfied

What I found is that discrete parameter domain of GP is not supported by botorch, and is recommended to use Ax instead of botorch (#177 ), but I'm not sure if Ax satisfies my situations.

Any comments are highly appreciated.

Sometimes evaluating same parameters

๐Ÿ› Bug

def sample_from_gp(X, y):
  gp = SingleTaskGP(X, y)
  mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
  fit_gpytorch_model(mll)
  ei = ExpectedImprovement(gp, best_f=best_f, maximize=False)
  bounds = torch.from_numpy([lower_bounds, upper_bounds])
  x = joint_optimize(ei, bounds=bounds, q=1, num_restarts=3, raw_samples=15)
  return x

sampling the same values again and again even though the function receives a different size of X.
I'm using version 0.1.3 botorch, 1.2.0 pytorch and python 3.7.4.

I thought that since the current best performance is perfectly better than other values, GP only samples the same values again and again. However, there seem some better values than repeated values. e.g. job_id=38, 55 (better value) and job_id=41-44, 56-57 (repeated values).

What is the problem and how can I prevent this from happening?

An example of function values on sampled parameters as follows:

job_id, function value (10d-styblinski function)
0,-100.3141088109847
1,-40.6507229209054
2,-145.12754856522128
3,-129.2896154076864
4,-177.44764475162197
5,-140.18387252214973
6,31.73638292336964
7,-168.35422686197452
8,77.72236087723707
9,-150.56561899052934
10,-171.09975497628568
11,-171.09975497628568
12,-171.09975497628568
13,-171.09975497628568
14,-171.09975497628568
15,-171.09975497628568
16,-171.09975497628568
17,-171.09975497628568
18,-171.09975497628568
19,-171.09975497628568
20,148.6724265128246
21,-171.09975497628568
22,-171.09975497628568
23,-123.33689147554634
24,-43.56836243792691
25,-171.09975497628568
26,-171.09975497628568
27,-171.09975497628568
28,-171.09975497628568
29,-171.09975497628568
30,-171.09975497628568
31,-171.09975497628568
32,-171.09975497628568
33,-171.09975497628568
34,-171.09975497628568
35,-171.09975497628568
36,-171.10602734194646
37,-171.1113286019813
38,-203.8393595853362
39,-14.999353142656219
40,-171.09975497628568
41,-161.86866751628025
42,-161.86866751628025
43,-161.86866751628025
44,-161.86866751628025
45,-34.74721773557974
46,-161.86866751628025
47,-60.85042059087937
48,-198.2800479242175
49,-198.2800479242175
50,-198.2800479242175
51,-198.2800479242175
52,-171.29510186881367
53,-198.2800479242175
54,-198.2800479242175
55,-222.93163588013937
56,-198.2800479242175
57,-198.2800479242175
58,-86.52857326362621
59,-95.46479251301344
60,-5.68860420124463
61,-45.37620554975092
62,-38.34013783934563
63,18.69600335167229
64,-208.57689263653083
65,-198.30414410590038
66,-242.1969645392432
67,-150.1231199077476
68,-64.25288043117575
69,-90.1614677750099
70,-128.2946323346996
71,70.13977382816279
72,-65.83446591966339
73,-146.08577759482296
74,-123.33689147554634
75,-95.46479251301344
76,55.53368098759756
77,-142.17147175303046
78,-164.087523894855
79,-198.2800479242175
80,-59.71922362569103
81,84.54133295514248
82,4.222878062237669
83,-28.67328599988764
84,249.5739976087075
85,-198.2800479242175
86,-86.52336704875256
87,-123.33689147554634
88,-248.57699194650786
89,-85.71092944977511
90,-76.88314918982381
91,-112.02563778449797
92,-198.2800479242175
93,-29.662138372655477
94,-156.14261430782528
95,-198.2800479242175
96,-198.2800479242175
97,122.37339426522544
98,-198.2800479242175
99,-198.2800479242175
100,-198.2800479242175

[Bug] 'symeig_cpu: the algorithm failed to converge' error when using qNEI

๐Ÿ› Bug

When using qNEI with SingleTaskGP, sometimes you will run into
RuntimeError: symeig_cpu: the algorithm failed to converge; 2 off-diagonal elements of an intermediate tridiagonal form did not converge to zero.

I believe it happens more often when adding the candidates to your training data.

To reproduce

Working Colab example

** Stack trace/error message **

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-54a7133f99a7> in <module>()
     57               model, train_X)
     58   candidates = joint_optimize(
---> 59   qNEI, bounds=bounds, q=1, num_restarts=15, raw_samples=40); print(evaluate(candidates)) #\
     60   train_X = torch.cat([train_X, candidates]) # Add candidate to training data, and retrain

20 frames
/usr/local/lib/python3.6/dist-packages/gpytorch/utils/eig.py in batch_symeig(mat)
     21 
     22     for i in range(batch_shape.numel()):
---> 23         evals, evecs = mat[i].symeig(eigenvectors=True)
     24         mask = evals.ge(0)
     25         eigenvectors[i] = evecs * mask.type_as(evecs).unsqueeze(0)

RuntimeError: symeig_cpu: the algorithm failed to converge; 2 off-diagonal elements of an intermediate tridiagonal form did not converge to zero.

Expected Behavior

for qNEI to give me back a candidate without error-ing

System information

Please complete the following information:

Latest (master) versions of gpytorch and botorch on Google Colab

Additional context

This issue started appearing around 08.08 for me, possibly because the pytorch version was updated to 1.2.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.