juliagaussianprocesses / stheno.jl Goto Github PK

Probabilistic Programming with Gaussian processes in Julia

License: Other

Julia 100.00%

bayesian-inference probabilistic-programming approximate-inference gaussian-processes machine-learning

stheno.jl's Introduction

Stheno

Stheno is designed to make doing some kinds of non-standard things with Gaussian processes straightforward. It has a simple modeling syntax, is inherently able to handle both multi-input and multi-output problems, and trivially supports interdomain pseudo-point approximations.

We also have a Python version of the package

Please open issues liberally -- if there's anything that's unclear or doesn't work, we would very much like to know about it.

Installation - ] add Stheno.

JuliaCon 2019 Talk

Go faster with TemporalGPs.jl

A Couple of Examples

The primary sources of information regarding this package are the documentation and the examples folder, but here are a couple of flashy examples to get started with.

Please raise an issue immediately if either of these examples don't work -- they're not currently included in CI, so there's always a higher chance that they'll be outdated than the internals of the package.

In this first example we define a simple Gaussian process, make observations of different bits of it, and visualise the posterior. We are trivially able to condition on both observations of both f₁ and f₃, which isn't something that's typically straightforward.

#
# We'll get going by setting up our model, generating some toy observations,
# and constructing the posterior processes produced by conditioning on these
# observations.
#

using AbstractGPs, Stheno, Random, Plots

# Create a pseudo random number generator for reproducibility.
rng = MersenneTwister(123456);

# Define a distribution over f₁, f₂, and f₃, where f₃(x) = f₁(x) + f₂(x).
# This `GPPP` object is just an `AbstractGPs.AbstractGP` object.
f = @gppp let
    f₁ = GP(randn(rng), SEKernel())
    f₂ = GP(SEKernel())
    f₃ = f₁ + f₂
end;

# Sample `N₁` / `N₂` locations at which to measure `f₁` / `f₃`.
N₁, N₃ = 10, 11;
X₁ = GPPPInput(:f₁, rand(rng, N₁) * 10);
X₃ = GPPPInput(:f₃, rand(rng, N₃) * 10);
X = BlockData(X₁, X₃);

# Pick out the bits of `f` that we're interested in, and the variance
# of the noise under which we plan to measure them.
σ² = 1e-2
fx = f(X, 1e-2);

# Sample toy observations of `f₁` / `f₃` at `X₁` / `X₃`.
y = rand(rng, fx);

# You could work backwards to figure out which elements of `y` correspond to
# which of the elements of `X`, but `Stheno.jl` provides methods of `split` to
# do this for you.
ŷ₁, ŷ₃ = split(X, y);

# Compute the logpdf of the observations. Notice that this looks exactly like
# what you would write in AbstractGPs.jl.
l = logpdf(fx, y)

# Compute the ELBO of the observations, with pseudo-points at the same
# locations as the observations. Could have placed them in any of the processes
# in f, even in f₂.
l ≈ elbo(fx, y, f(X))

# Compute the posterior. This is just an `AbstractGPs.PosteriorGP`.
f′ = posterior(fx, y);



#
# The are various things that we can do with a Stheno model.
#

# Sample jointly from the posterior over all of the processes.
Np, S = 500, 11;
X_ = range(-2.5, stop=12.5, length=Np);
Xp1 = GPPPInput(:f₁, X_);
Xp2 = GPPPInput(:f₂, X_);
Xp3 = GPPPInput(:f₃, X_);
Xp = BlockData(Xp1, Xp2, Xp3);
f′_Xp = rand(rng, f′(Xp, 1e-9), S);

# Chop up posterior samples using `split`.
f₁′Xp, f₂′Xp, f₃′Xp = split(Xp, f′_Xp);



#
# We make use of the plotting recipes in AbstractGPs to plot the marginals,
# and manually plot the joint posterior samples.
#

# Instantiate plot and chose backend.
plotly();
posterior_plot = plot();

# Plot posterior over f1.
plot!(posterior_plot, X_, f′(Xp1); color=:red, label="f1");
plot!(posterior_plot, X_, f₁′Xp; color=:red, label="", alpha=0.2, linewidth=1);
scatter!(posterior_plot, X₁.x, ŷ₁;
    markercolor=:red,
    markershape=:circle,
    markerstrokewidth=0.0,
    markersize=4,
    markeralpha=0.7,
    label="",
);

# Plot posterior over f2.
plot!(posterior_plot, X_, f′(Xp2); color=:green, label="f2");
plot!(posterior_plot, X_, f₂′Xp; color=:green, label="", alpha=0.2, linewidth=1);

# Plot posterior over f3
plot!(posterior_plot, X_, f′(Xp3); color=:blue, label="f3");
plot!(posterior_plot, X_, f₃′Xp; color=:blue, label="", alpha=0.2, linewidth=1);
scatter!(posterior_plot, X₃.x, ŷ₃;
    markercolor=:blue,
    markershape=:circle,
    markerstrokewidth=0.0,
    markersize=4,
    markeralpha=0.7,
    label="",
);

display(posterior_plot);

In the above figure, we have visualised the posterior distribution of all of the processes. Bold lines are posterior means, and shaded areas are three posterior standard deviations from these means. Thin lines are samples from the posterior processes.

In this next example we make observations of two different noisy versions of the same latent process. Again, this is just about doable in existing GP packages if you know what you're doing, but isn't straightforward.

using AbstractGPs, Stheno, Random, Plots

# Create a pseudo random number generator for reproducibility.
rng = MersenneTwister(123456);

# Construct a Gaussian Process Probabilistic Programme,
# which is just an AbstractGP.
f = @gppp let

    # Define a smooth latent process that we wish to infer.
    f = GP(SEKernel())

    # Define the two noise processes described.
    g = x->sin.(x) .- 5.0 .+ sqrt.(abs.(x))
    noise1 = sqrt(1e-2) * GP(WhiteKernel()) + g
    noise2 = sqrt(1e-1) * GP(3.5, WhiteKernel())

    # Define the processes that we get to observe.
    y1 = f + noise1
    y2 = f + noise2
end;

# Generate some toy observations of `y1` and `y2`.
X1 = GPPPInput(:y1, rand(rng, 3) * 10);
X2 = GPPPInput(:y2, rand(rng, 10) * 10);
X = BlockData(X1, X2);
y = rand(rng, f(X));
ŷ1, ŷ2 = split(X, y);

# Compute the posterior GPPP.
f′ = posterior(f(X), y);

# Sample jointly from the posterior processes.
X_ = range(-2.5, stop=12.5, length=500);
Xp_f = GPPPInput(:f, X_);
Xp_y1 = GPPPInput(:y1, X_);
Xp_y2 = GPPPInput(:y2, X_);
Xp = BlockData(Xp_f, Xp_y1, Xp_y2);

# Sample jointly from posterior over process, and split up the result.
f′Xp, y1′Xp, y2′Xp = split(Xp, rand(rng, f′(Xp, 1e-9), 11));

# Compute and split up posterior marginals. We're not using the plotting
# recipes from AbstractGPs here, to make it clear how one might compute the
# posterior marginals manually.
ms = marginals(f′(Xp, 1e-9));
μf′, μy1′, μy2′ = split(Xp, mean.(ms));
σf′, σy1′, σy2′ = split(Xp, std.(ms));

# Instantiate plot and chose backend
plotly();
posterior_plot = plot();

# Plot posterior over y1.
plot!(posterior_plot, X_, μy1′; color=:red, ribbon=3σy1′, label="", fillalpha=0.3);
plot!(posterior_plot, X_, y1′Xp; color=:red, label="", alpha=0.2, linewidth=1);
scatter!(posterior_plot, X1.x, ŷ1;
    markercolor=:red,
    markershape=:circle,
    markerstrokewidth=0.0,
    markersize=4,
    markeralpha=0.8,
    label="Sensor 1",
);

# Plot posterior over y2.
plot!(posterior_plot, X_, μy2′; color=:green, ribbon=3σy2′, label="", fillalpha=0.3);
plot!(posterior_plot, X_, y2′Xp; color=:green, label="", alpha=0.2, linewidth=1);
scatter!(posterior_plot, X2.x, ŷ2;
    markercolor=:green,
    markershape=:circle,
    markerstrokewidth=0.0,
    markersize=4,
    markeralpha=0.8,
    label="Sensor 2",
);

# Plot posterior over f.
plot!(posterior_plot, X_, μf′; color=:blue, ribbon=3σf′, label="f", fillalpha=0.3);
plot!(posterior_plot, X_, f′Xp; color=:blue, label="", alpha=0.2, linewidth=1);

display(posterior_plot);

As before, we visualise the posterior distribution through its marginal statistics and joint samples. Note that the posterior samples over the unobserved process are (unsurprisingly) smooth, whereas the posterior samples over the noisy processes still look uncorrelated and noise-like.

See the docs for more examples.

Hyperparameter learning and inference

Fortunately, there is really no need for this package to explicitly provide support for hyperparameter optimisation as the functionality is already available elsewhere -- it's sufficient that it plays nicely with other standard packages in the ecosystem such as Zygote.jl (reverse-mode algorithmic differentiation), Optim.jl (non-linear optimisation), AdvancedHMC.jl (Hamiltonian Monte Carlo / NUTS), ParameterHandling.jl, Soss.jl / Turing.jl. For concrete examples of the use of each of these packages in conjunction with Stheno, see the Getting Started section of the (dev) docs.

Non-Gaussian problems

As with hyperparmeter learning, Stheno doesn't support non-Gaussian likelihoods directly. Rather, we expect users to obtain this functionality through the general functionality provided in AbstractGPs. This part of the JuliaGaussianProcesses ecosystem is under development in ApproximateGPs.jl -- everything there should interoperate well with Stheno.jl.

GPs + Deep Learning

Again, explicit support not included. Rather just use Stheno in conjunction with Flux.jl and related packages.

stheno.jl's People

Contributors

Stargazers

Watchers

stheno.jl's Issues

Automatic construction of `CustomMean` functions

It would be useful to be able to write something of the form

mean(μ + sin, X)

which should be equivalent to

mean(μ, X) + CustomMean(sin, X)

This isn't currently possible, and is blocked by #5.

Input Transformations

Problem

The current interface for handling input transformations is ugly. To change endow a kernel / mean function with a particular length scale we must do something like

k′ = InputTransform(k, f)

where f is some function to be applied to the input to k before it is computed.

Specification

I propose to introduce a dummy-type called Input to which one can apply an arbitrary function, and an arbitrary composition of functions by extension. The type would merely keep track of what operations are applied to it. For example

x = Input()
x′ = f(x)

would yield another input type which knows that it was constructed from a "leaf" Input type. I further propose that

x = f(Input())
μ(x) == InputTransform(μ, f)
k(x) == InputTransform(k, f)
GP(x, μ, k, gpc) == GP(μ(x), k(x), gpc)

A consequence of this is the following elegant canonical model declaration format:

function model(x::Input, gpc::GPC)
    x1 = t1(x)
    x2 = t2(x)
    f = GP(x1, μ1, k1, gpc)
    g = GP(x2, μ2, k2, gpc)
    h = GP(x2, μ3, k3, gpc)
    # some other code combining f, g, and h and returning stuff.
end

Note that we have got parameter tying for free as g and h share the same input parameters in this example.

Implementation

I need to work out the details for this, but it's definitely doable with some tape-based tracking. It's not clear from the specification precisely how we should get the transformed parameters to the kernels and mean functions. Possibly the tape should be managed by the gpc, and kernels / mean functions have to ask the gpc for the appropriate input.

Example Request, simple GP Regression

I was not able to understand from the documentation and given examples how to proceed in order to implement a simple regression. For example, using GaussianProcesses.jl I would do:

using Plots, GaussianProcesses

xt =  [-4., -2.5, -1., 0., 2.]
yt = [-2., 0., .1, 2., -1.]

gp = GP(xt,yt, MeanZero(), SE(0.0,0.0))
μ, σ² = predict_y(gp,range(-5, 3, length=100))

plot(gp)

I understand the Stheno is more about composing GPs and other operations, and because of that uses an interface that is very different from the "Scikitlearn" based one that GaussianProcesses.jl adopted, and people are used to.

Thanks

Issue when adding Stheno after Turing

I am using Julia 1.3.1. After adding Turing, I have added Stheno, which works fine. However, typing using g Stheno produces the following error:
`

julia> using Stheno
[ Info: Precompiling Stheno [8188c328-b5d6-583d-959b-9690869a5511]
ERROR: LoadError: LoadError: UndefVarError: cumulsizes not defined
Stacktrace:
[1] include at ./boot.jl:328 [inlined]
[2] include_relative(::Module, ::String) at ./loading.jl:1105
[3] include at ./Base.jl:31 [inlined]
[4] include(::String) at /home/kongi/.julia/packages/Stheno/m0I3t/src/Stheno.jl:1
[5] top-level scope at /home/kongi/.julia/packages/Stheno/m0I3t/src/Stheno.jl:32
[6] include at ./boot.jl:328 [inlined]
[7] include_relative(::Module, ::String) at ./loading.jl:1105
[8] include(::Module, ::String) at ./Base.jl:31
[9] top-level scope at none:2
[10] eval at ./boot.jl:330 [inlined]
[11] eval(::Expr) at ./client.jl:425
[12] top-level scope at ./none:3
in expression starting at /home/kongi/.julia/packages/Stheno/m0I3t/src/util/block_arrays/dense.jl:4
in expression starting at /home/kongi/.julia/packages/Stheno/m0I3t/src/Stheno.jl:32
ERROR: Failed to precompile Stheno [8188c328-b5d6-583d-959b-9690869a5511] to /home/kongi/.julia/compiled/v1.3/Stheno/sVmvC_ZroTh.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1283
[3] _require(::Base.PkgId) at ./loading.jl:1024
[4] require(::Base.PkgId) at ./loading.jl:922
[5] require(::Module, ::Symbol) at ./loading.jl:917
`

Plotting Docs

The new plotting functionality, courtesy of @NathBo, it great. It's currently undocumented other than in the readme though. Would be good if a page could be added to the docs describing the API and giving a couple of basic usage examples.

Example using Turing + Stheno

I was trying to put together a simple example using Stheno and Turing. Based on other examples I thought the below would work but I am getting an error about a missing method. Any ideas what is going wrong? I'm using Julia 1.5.2, Stheno v0.6.15 and Turing v0.14.10.

using Turing
using Distributions
using Random
using Stheno

# make some fake data
l = 0.4
σ² = 1.3
σ²_n = 0.05 # noise
x_ = collect(range(-4.0, 4.0; length=100))
k = σ² * stretch(Matern52(), 1 / l)
f = GP(k, GPC())
fx = f(x_,σ²_n) 
y_ = rand(fx)

# prior model
@model gp0(y,x) = begin
    # priors
    σ² ~ LogNormal(0, 1)
    l ~ LogNormal(0, 1)
    σ²_n ~ LogNormal(0, 1)  
    k =  σ² * stretch(Matern52(), 1 ./ l)
    gp = GP(k, GPC())
    # y ~ normal(GP,σ²_n)
    y ~ gp(x,σ²_n + 1e-3)
end

# sample from posterior
m = gp0(y_, x_)
@time chain = sample(m, HMC(0.01, 100), 10)

Resulting error message:

ERROR: MethodError: no method matching logdetcov(::Stheno.FiniteGP{GP{Stheno.ZeroMean{Float64},Stheno.Scaled{Array{Float64,1},Stheno.Stretched{Array{Float64,1},Matern52,typeof(identity)},typeof(identity)}},Array{Float64,1},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}})

Thanks for any help - really like the package.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

FillArrays

Most FillArrays-related functionality has been removed while simplification to the package were introduced to remove maintenance overhead while pushing towards 1.0. Might be helpful to re-introduce them at some point.

Functionality present in version 0.4.0 of Stheno, although not used.

Automating collation of relevant bibliographic info

For any given model written using Stheno, it is quite possible that there will be several relevant publications. Consequently, it would be useful to automate the collation of a list of publications relevant to a particular model from its code.

One way to achieve this is to use Cassette to replace the usual method invocations will methods that instead return bibliographic information relevant to that method (nothing by default). In this way, when a new function / method is added to Stheno, relevant bibliographic information can be provided by the implementer.

Shared data for finite GP

First, great job on the library. It is really cool.

You seem to be interested in how people use your library so I thought I would give one quick thought. I use it for bayesian optimization and the general procedure is to compute a gp, select a new point, compute a new gp, etc. This makes it slightly annoying that Stheno uses shared data when you pass the data to a gp. I show an example of that below.

Also, I don't think the shared data has any real benefit as plotting the gp fails if you change the data and don't recompute the gp. Also, shown in the example.

There is a simple solution to just add copy(X_data) to function call. But since I don't think it serves any purpose to use shared data, I thought I at least let you know my use case.

X_data = rand(2) .* 10 .- 5
y_data = sin.(X_data)
f = GP(1. * Matern52(), GPC())
f_p = f | Obs(f(X_data), y_data)

push!(X_data, 1) # fails because of shared data
X_data[2] = -2 
plot(f_p(collect(-5:.02:5)), samples = 10, color = "blue") # fails since the gp didn't update

Multi-output GP?

The readme mentions multi-input and multi-output GPs, but I can't seem to find the functionality to handle multiple outputs. For now I'm just interested in independent outputs, so the most straightforward way might be to just maintain a list of GPs?

Modeling Derivatives et al

I'm opening this to scope out + plan the best manner in which Stheno.jl can be extended (or, rather, re-extended -- we used to have functionality that did this but it had some serious limitations) model the derivatives and related quantities (gradients, hessians, jacobians, etc) of functions.

There are two main things to consider:

what is the scope of the things that we want to handle, in particular do we want to cover both multiple inputs and higher-order derivatives in an initial pass?
how to actually implement these things. For example, we've recently added support for multi-output GPs in KernelFunctions.jl, which might be helpful for e.g. modeling the gradient of a GP. I am very keen to avoid re-inventing the wheel, so perhaps a pre-requisite is to get Stheno.jl using KernelFunctions.jl?

Adding sparse finite GP type for integration with Turing

A continuation/focusing of this discussion on Discourse. Defining the following structure and methods allows using a sparse GP in a Turing model:

struct SparseFiniteGP{T1<:Stheno.FiniteGP, T2<:Stheno.FiniteGP} <: Distributions.AbstractMvNormal
    fobs::T1
    finducing::T2
end

function Stheno.logpdf(f::T, y::AbstractArray{T1, 1}) where {T <: SparseFiniteGP, T1 <: Real}
    return elbo(f.fobs, y, f.finducing)
end

Base.length(f::SparseFiniteGP) = length(f.fobs)
Distributions._rand!(rng::Random._GLOBAL_RNG, f::SparseFiniteGP1, x::Array{T}) where {T<:Real} = rand(f.fobs)

An example model is shown below. Using the SparseFiniteGP speeds up inference dramatically for a simulated dataset with ~1000 observations.

@model function example_model_sparse(x, z, xu)
    f = GP(Matern32(), GPC())
    fsparse = SparseFiniteGP(f(x, 1e-6),  f(xu, 1e-6))
    y ~ fsparse
    λ = exp.(y)
    z .~ Poisson.(λ)
end
chn_sparse = sample(example_model_sparse(x, z, xu), NUTS(), 200)

Some questions to discuss while I put together a PR...

Any thoughts on the interface or structure of the types?
Besides rand and logpdf, what other methods does this new type need?
Where should this code go? In src/abstract_gp.jl?
What tests are needed--in particular, what comparisons should there be between the sparse and dense GPs?

logabsdet for Diagonal

https://github.com/willtebbutt/Stheno.jl/blob/master/src/util/block_arrays/diagonal.jl#L202 is now present in LinearAlgebra module (JuliaLang/julia#32998). Probably it is best to define this method in Stheno.jl conditionally (or just stop defining it depending on your Julia support policy)

Parameter handling

Problem

There is currently no pleasant mechanism for handling parameters in one of Stheno's models. For example, the following is an example the current best way to handle parameters:

function model(θ::AbstractVector, gpc::GPC)
    f = GP(μ1(θ[1]), k1(θ[2:3]), gpc)
    g = GP(μ2(θ[4:6], k2(θ[7]), gpc)
    # some other code combining f, and g, and returning stuff.
end

The above code clearly doesn't scale well to problems where one requires a large number of parameters; each time one changes the model function there is the possibility that the way θ is indexed will need to change.

Note that this is a general problem in Machine Learning. Whatever solution I come up with here is a stop-gap on the road to a generic Julia solution being reached.

Specification

I propose to make model parameters from a combination of a Vector and an Int, and a method to retrieve the "next" set of free parameters.

struct ParVector{V<:AbstractVector}
    v::V
    n::Base.RefValue{Int}
    ParVector(v::V) where V<:AbstractVector = new{V}(v, Ref(0))
end
function getnext!(θ::ParVector, N::Int)
    out = θ.v[θ.n[]+1:θ.n[]+N]
    θ.n[] += N
    return out
end

Using the above type, one could then re-write the previous model as follows:

function model(θ::AbstractVector, gpc::GPC)
    f = GP(μ1(θ), k1(θ), gpc)
    g = GP(μ2(θ), k2(θ), gpc)
    # some other code combining f, and g, and returning stuff.
end

where each mean function and kernel simply calls the getnext! method of θ with whatever value of N is requires.

Implementation

The above forms the basics of an implementation, but the following details need to be worked out:

How can one determine the appropriate length for θ?
How can one efficiently specify initialisations for θ?
How can it be made straightforward to determine which element of θ corresponds to what component in the model?
What is the minimal interface a mean function / kernel (or other) needs to implement to take advantage of this type?

Issues

It's not immediately clear how parameter tying would work here.

Flux compatible types that transform inputs

Based on our discussion here, we agree to design some types to transform input variables, we wish these types work seamlessly with Flux's API, transformations in KernelFunctions.jl are good examples.

Could we achieve it by just letting Stheno.jl work with KernelFunctions.jl ?

Support for precomputed kernels

Often you don't have (or need) a function to compute a kernel matrix on the fly, but you already have a precomputed Gram matrix. Example code how I solved this (because I needed a diffusion kernel):

struct Diffusion <: Kernel
    K
    function Diffusion(A::AbstractMatrix, β::Real=1.0; norm=false)
        L = laplacian(A)
        if norm
            L = norm_laplacian(L)
        end
        return new(exp(β * L))
    end
end

#Diffusion(E::Eigen, β::Real=1.0) = Diffusion(diffusion(E, β))

ew(k::Diffusion, x::AbstractVector{<:Int}) = [k.K[i,i] for i in x]
ew(k::Diffusion, x::AbstractVector{<:Int}, x′::AbstractVector{<:Int}) = [k.K[i,j] for (i,j) in zip(x, x′)]

pw(k::Diffusion, x::AbstractVector{<:Int}) = k.K[x,x]
pw(k::Diffusion, x::AbstractVector{<:Int}, x′::AbstractVector{<:Int}) = k.K[x,x′]

Maybe we should have PrecomputedKernel type to mediate this?

can't install with unsatisfiable requirements detected

Hi, I'm new to Julia so the issue might be there, too. I installed Julia yesterday and wanted to try out some Bayesian stuff and found the package - looks great!
However, I couldn't install it and because Julia is fresh, both in my skills as well as on my Computer, I thought I'd ask. Stacktrace is below, I am using Ubunut 18.04 if that helps.

ERROR: Unsatisfiable requirements detected for package Stheno [8188c328]:
Stheno [8188c328] log:
├─possible versions are: [0.1.0-0.1.1, 0.2.0-0.2.1, 0.3.0-0.3.2, 0.4.0-0.4.2, 0.5.0, 0.6.0-0.6.4] or uninstalled
├─restricted to versions * by an explicit requirement, leaving only versions [0.1.0-0.1.1, 0.2.0-0.2.1, 0.3.0-0.3.2, 0.4.0-0.4.2, 0.5.0, 0.6.0-0.6.4]
├─restricted by julia compatibility requirements to versions: 0.1.0-0.1.1 or uninstalled, leaving only versions: 0.1.0-0.1.1
└─restricted by compatibility requirements with FillArrays [1a297f60] to versions: [0.2.0-0.2.1, 0.3.0-0.3.2, 0.4.1-0.4.2, 0.5.0, 0.6.0-0.6.4] or uninstalled — no versions left
└─FillArrays [1a297f60] log:
├─possible versions are: [0.2.0-0.2.1, 0.3.0, 0.4.0, 0.5.0, 0.6.0-0.6.4, 0.7.0-0.7.4, 0.8.0-0.8.10] or uninstalled
└─restricted by compatibility requirements with Distributions [31c24e10] to versions: 0.8.0-0.8.10
└─Distributions [31c24e10] log:
├─possible versions are: [0.16.0-0.16.4, 0.17.0, 0.18.0, 0.19.1-0.19.2, 0.20.0, 0.21.0-0.21.3, 0.21.5-0.21.12, 0.22.0-0.22.6, 0.23.0-0.23.4] or uninstalled
└─restricted to versions 0.23.4 by an explicit requirement, leaving only versions 0.23.4
Stacktrace:
[1] #propagate_constraints!#61(::Bool, ::Function, ::Pkg.GraphType.Graph, ::Set{Int32}) at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:1007
[2] propagate_constraints! at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:948 [inlined]
[3] #simplify_graph!#121(::Bool, ::Function, ::Pkg.GraphType.Graph, ::Set{Int32}) at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:1462
[4] simplify_graph! at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:1462 [inlined] (repeats 2 times)
[5] resolve_versions!(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}, ::Nothing) at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/Operations.jl:373
[6] resolve_versions! at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/Operations.jl:316 [inlined]
[7] #add_or_develop#62(::Array{Base.UUID,1}, ::Symbol, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/Operations.jl:1201
[8] #add_or_develop at ./none:0 [inlined]
[9] #add_or_develop#15(::Symbol, ::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:69
[10] #add_or_develop at ./none:0 [inlined]
[11] #add_or_develop#14 at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:34 [inlined]
[12] #add_or_develop at ./none:0 [inlined]
[13] #add_or_develop#11 at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:33 [inlined]
[14] #add_or_develop at ./none:0 [inlined]
[15] #add_or_develop#10 at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:32 [inlined]
[16] #add_or_develop at ./none:0 [inlined]
[17] #add#20 at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:74 [inlined]
[18] add(::String) at /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:74
[19] top-level scope at none:0

Bayesian quadrature?

Hi @willtebbutt , curious if you've considered using Stheno for BQ, or what this could look like.

A "static" probabilistic program gives a factorization of the likelihood, and Stheno could conceivably (I think) be used to represent these same dependencies as in a GP.

From what I've seen, the choice of optimal evaluation points for evaluating ∫f(x)p(x) dx doesn't depend on f, but only on p. So it could be done once and for all, and then re-used.

Does this make sense? I realize my description here is a bit hand-wavy

Cross-Validation (pseudo-likelihood) Criterion

I am finding for a specific use case for Gaussian Process Regression that optimizing kernel hyperparameters using the marginal likelihood logpdf results in poor generalization.

I would like to try optimizing the kernel hyperparameters instead with respect to the pseudo-likelihood, Equation 5.11/5.12 in Rasmussen and Williams Chapter 5.

I'm happy to make an attempt at implementing this, but wondering if there have been any previous thoughts about doing this, or any suggestions on how to structure.

Additive kernels

I started writing code to implement additive kernels as in Additive Gaussian Processes.

Here's how I imagine usage (similar to the Stretched kernel, where the struct has a field with to wrap a kernel).

# Additive kernel
struct Additive{
		Dim <: Int, 
		Ker <: Kernel
		} <: Kernel
    dim::Int
    ker::Ker
    
    # Inner constructor.
    Additive(dim::Int, ker::K) where {K <: Kernel} = new{Int, Kernel}(dim, ker)
end

# Binary methods
ew(k::Additive, x::AbstractVector, x′::AbstractVector) = ew(k.ker, x[k.dim, :], x′[k.dim, :])
pw(k::Additive, x::AbstractVector, x′::AbstractVector) = pw(k.ker, x[k.dim, :], x′[k.dim, :])
ew(k::Additive, x::ColVecs, x′::ColVecs) = ew(k.ker, x.X[k.dim, :], x′.X[k.dim, :])
pw(k::Additive, x::ColVecs, x′::ColVecs) = pw(k.ker, x.X[k.dim, :], x′.X[k.dim, :])

# Unary methods
ew(k::Additive, x::AbstractVector) = ew(k.ker, x[k.dim, :])
pw(k::Additive, x::AbstractVector) = pw(k.ker, x[k.dim, :])
ew(k::Additive, x::ColVecs) = ew(k.ker, x.X[k.dim, :])
pw(k::Additive, x::ColVecs) = pw(k.ker, x.X[k.dim, :])

# Here's an additive model implementation. This is a Gaussian process with a sum of simple squared-exponential kernels, scaled by a length scale.
@model function AdditiveGPModel(dim_features)
    params = [(randn(rng), randn(rng)) for i in 1:dim_features]
    ker_array = [params[i][1] * Additive(i, EQ()) for i in 1:dim_features]
    k = foldl(+, ker_array)
    additive_gp = GP(k, GPC())
    return additive_gp, ker_array, params
end

# Make model.
x = ColVecs(randn(rng, (10, 1)))
y = [3.0]
f₁, collection, params = AdditiveGPModel(10)

# Try inference.
fx = f₁(x)
post = f₁ | Obs(fx, y)

This code doesn't work yet - error is:
MethodError: objects of type Stheno.GPC are not callable

Would this be an interesting enhancement?

0.7 Roadmap

Complete initial overhaul of internals in #161
Re-instate SparseFiniteGP functionality
Update docs with new API + ParameterHandling
Update examples with new API
Release 0.7 + announce on discourse

Post-release TODO:

tidy up tests
improve test coverage
reinstate Flux integration tests

Numerical instability in pairwise when taking gradient

Contrived MWE:

using Stheno
using Zygote

k = 1.3 * stretch(matern52(), 1 / 0.4)
f = GP(k, GPC())
const x = randn(100)
σ²_n = 1e-3
fx = f(x, σ²_n)
const y = rand(fx)

f_posterior = f | Obs(fx, y)
d = 1e-12
Zygote.gradient(x′ -> Stheno.mean_vector(f_posterior, x′), x .+ d)

(this reproduces the issue reliably for me)

This results in:

ERROR: DomainError with -1.3877787807814457e-17:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x))
Stacktrace:
 [1] throw_complex_domainerror(::Symbol, ::Float64) at ./math.jl:31
 [2] sqrt at ./math.jl:492 [inlined]
 [3] _broadcast_getindex_evalf at ./broadcast.jl:578 [inlined]
 [4] _broadcast_getindex at ./broadcast.jl:551 [inlined]
 [5] getindex at ./broadcast.jl:511 [inlined]
 [6] macro expansion at ./broadcast.jl:843 [inlined]
 [7] macro expansion at ./simdloop.jl:73 [inlined]
 [8] copyto! at ./broadcast.jl:842 [inlined]
 [9] copyto! at ./broadcast.jl:797 [inlined]
 [10] materialize! at ./broadcast.jl:756 [inlined]
 [11] #adjoint#20 at /Users/pawel/.julia/packages/Stheno/j89z0/src/util/zygote_rules.jl:43 [inlined]
 [12] #adjoint at ./none:0 [inlined]
 [13] _pullback at /Users/pawel/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:53 [inlined]
 [14] pairwise at /Users/pawel/.julia/packages/Stheno/j89z0/src/util/distances.jl:8 [inlined]
 [15] pairwise at /Users/pawel/.julia/packages/Stheno/j89z0/src/gp/kernel.jl:175 [inlined]
 [16] _pullback(::Zygote.Context, ::typeof(Distances.pairwise), ::Stheno.Matern52, ::Array{Float64,1}, ::Array{Float64,1}) at /Users/pawel/.julia/packages/Zygote/oPQFy/src/compiler/interface2.jl:0
 [17] pairwise at /Users/pawel/.julia/packages/Stheno/j89z0/src/gp/kernel.jl:447 [inlined]
 [18] pairwise at /Users/pawel/.julia/packages/Stheno/j89z0/src/gp/kernel.jl:425 [inlined]
 [19] _pullback(::Zygote.Context, ::typeof(Distances.pairwise), ::Stheno.Scaled{Float64,Stheno.Stretched{Float64,Stheno.Matern52}}, ::Array{Float64,1}, ::Array{Float64,1}) at /Users/pawel/.julia/packages/Zygote/oPQFy/src/compiler/interface2.jl:0
 [20] cov at /Users/pawel/.julia/packages/Stheno/j89z0/src/gp/gp.jl:31 [inlined]
 [21] cov at /Users/pawel/.julia/packages/Stheno/j89z0/src/gp/gp.jl:35 [inlined]
 [22] mean_vector at /Users/pawel/.julia/packages/Stheno/j89z0/src/composite/conditioning.jl:16 [inlined]
 [23] _pullback(::Zygote.Context, ::typeof(Stheno.mean_vector), ::Tuple{typeof(|),GP{Stheno.ZeroMean{Float64},Stheno.Scaled{Float64,Stheno.Stretched{Float64,Stheno.Matern52}}},LinearAlgebra.Cholesky{Float64,Array{Float64,2}},Array{Float64,1},Stheno.FiniteGP{GP{Stheno.ZeroMean{Float64},Stheno.Scaled{Float64,Stheno.Stretched{Float64,Stheno.Matern52}}},Array{Float64,1},LinearAlgebra.Diagonal{Float64,Array{Float64,1}}},Array{Float64,1}}, ::Array{Float64,1}) at /Users/pawel/.julia/packages/Zygote/oPQFy/src/compiler/interface2.jl:0
 [24] mean_vector at /Users/pawel/.julia/packages/Stheno/j89z0/src/composite/composite_gp.jl:18 [inlined]
 [25] _pullback(::Zygote.Context, ::typeof(Stheno.mean_vector), ::Stheno.CompositeGP{Tuple{typeof(|),GP{Stheno.ZeroMean{Float64},Stheno.Scaled{Float64,Stheno.Stretched{Float64,Stheno.Matern52}}},LinearAlgebra.Cholesky{Float64,Array{Float64,2}},Array{Float64,1},Stheno.FiniteGP{GP{Stheno.ZeroMean{Float64},Stheno.Scaled{Float64,Stheno.Stretched{Float64,Stheno.Matern52}}},Array{Float64,1},LinearAlgebra.Diagonal{Float64,Array{Float64,1}}},Array{Float64,1}}}, ::Array{Float64,1}) at /Users/pawel/.julia/packages/Zygote/oPQFy/src/compiler/interface2.jl:0
 [26] #5 at ./REPL[10]:1 [inlined]
 [27] _pullback(::Zygote.Context, ::getfield(Main, Symbol("##5#6")), ::Array{Float64,1}) at /Users/pawel/.julia/packages/Zygote/oPQFy/src/compiler/interface2.jl:0
 [28] _pullback(::Function, ::Array{Float64,1}) at /Users/pawel/.julia/packages/Zygote/oPQFy/src/compiler/interface.jl:31
 [29] pullback(::Function, ::Array{Float64,1}) at /Users/pawel/.julia/packages/Zygote/oPQFy/src/compiler/interface.jl:37
 [30] gradient(::Function, ::Array{Float64,1}) at /Users/pawel/.julia/packages/Zygote/oPQFy/src/compiler/interface.jl:46
 [31] top-level scope at none:0

There's a sqrt being taken in the pairwise distance calculation, numerical instabilities for points close to the observed points cause negative values here.

Even when the error is not thrown, the above can produce NaN values in the gradient (for instance, d=1e-12 produces errors, d=1e-8 produces NaN values, and d=1e-6 seems to work ok, but this depends on x.

Plotting Recipes

It's useful to be able to produce nice pretty plots of GPs in 1 dimension. Currently, this is a bit of a pain because no plotting recipes have been implemented. If we implemented some, it would significantly simplify a lot of the plotting that we do.

(If you're interested in making this happen, I'm happy to discuss)

Remove Julia jit time note from readme

You should remove the note about jit performance in the README.

First, a note for statistics / ML people who aren't too familiar with Julia: the first execution of the examples below will take a while as Julia has to compile the code. On subsequent runs (e.g. if you were repeatedly evaluating the logpdf for kernel parameter learning) it will progress much faster.

It's right at the top, and first impressions matter. I also think that if a user gets to the point where they're running examples they've likely encountered jit time vs subsequent execution times. Regardless, I don't think this readme is the ideal place for that explanation.

Question: GP regression with multi-dimensional inputs

I just wanted to play around with Stheno.jl for some simple GP regression problems, but I had trouble extending the very nice examples from https://willtebbutt.github.io/Stheno.jl/stable/getting_started/ to multi-dimensional inputs.

I have a minimal template of a standard GP regression on one-dimensional data here:

using Stheno

N = 100
X = randn(N)
y = randn(N)

f = GP(σ² * stretch(eq(), 1 / l), GPC())
f_posterior = f | (f(X, 0.05) ← y)

How can I modify this example in order to allow for multi-dimensional inputs

X = randn(N, d)

for some dimension d?

Example with custom kernel

The documentation is a bit hard to follow, IMO. Could you make an example where a custom kernel is created and used?

In my specific case, I would like to make a Jaccard kernel to learn over sets. This does not seem to work.

using Stheno: Kernel

jaccard(x, y) = length(intersect(x, y)) / length(union(x, y))
jaccard(x) = 1.0

struct JaccardKernel <: Kernel end

# Binary methods
function ew(k::JaccardKernel, x::AbstractVector, x′::AbstractVector)
    @assert length(x) == length(x′)
    p = length(x)
    r = zeros(p)
    for i in 1:p
        r[i] = jaccard(x[i], x′[i])
    end
    return r
end

function pw(k::JaccardKernel, x::AbstractVector, x′::AbstractVector)
     p, q = length(x), length(x′)
     R = zeros(p, q)
     for j in 1:q
         for i in 1:p
             R[i,j] = jaccard(x[i], x′[j])
         end
     end
     return R
 end
# Unary methods
ew(k::JaccardKernel, x::AbstractVector) = ones(length(x))
function pw(k::JaccardKernel, x::AbstractVector)
    p = length(x)
    R = zeros(p, q)
    for i in 1:q
        for j in 1:i
            R[i,j] = jaccard(x[i], x[j])
            R[j,i] = R[i,j]
        end
    end
    return R
end

using Stheno
using Stheno: @model

@model function model()

    # Define a smooth latent process.
    f = GP(0.0, JaccardKernel())

    # Define a latent noise process.
    ε = GP(eq())

    # Sum them to get the process of which we shall make observations.
    y = f + ε

    # Return all three processes so that we can inspect all of them.
    return f, ε, y
end
    
f, ε, y = model();

rand(f([[1,1], [2,1]]))

(don't mind implementation specifics)

Am I doing something fundamentally wrong here?

Kronecker-factored matrices

Consider a kernel of the form

(k)(x::Vector, x′::Vector) = prod(map(k.ks, x, x′))

where k.ks is a collection of kernels which can be applied to single-dimensions, and the dimensionality of x and x′, and the length of k.ks is D.
If we wish to evaluate the covariance matrix associated with a matrix of inputs which can be represented as a Cartesian product:

X = IterTools.product(x1, ..., xD)

where each xd is a vector, then cov(k, X) can be represented as the Kronecker product of a set of D matrices, the dth of which is cov(k.ks[d], xd). If this structure is present then the asymptotic complexity of inference is greatly reduced.

Details of this can be found in [1].
There was work at NIPS 2017 in the AABI workshop which maybe relates to this work - see [2].
There is no obvious reason why one shouldn't be able to exploit matrices which admit partial Kronecker factorisations. I'm not sure what / if there is relevant work is here though.
Perhaps there is something in the tensor decomposition / tensor train field / line-of-work (I'm not even sure what to call this) that can speak to the above. This video looks related: https://www.youtube.com/watch?v=CrSFne4h3Pk

To achieve efficient inference here, it will be necessary to implement a Kronecker-factorised matrix type which has appropriate efficient representations.

[1] - Saatçi, Yunus. Scalable inference for structured Gaussian process models. Diss. University of Cambridge, 2012.
[2] - Evans, Trefor W., and Prasanth B. Nair. "Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF)."

Redesign of mean function / kernel / cross-kernel evaluation

Problem

There are currently specialised implementations of mean for each MeanFunction, cov and xcov for each Kernel and CrossKernel, potentially for a variety of scenarios. This proposal is aims to sort out the interface to move towards a standard interface for new mean functions and kernels to implement which is more intuitive, and potentially removes some of the duplication in the code base at the minute.

Specification and Implementation

MeanFunctions / Kernels / CrossKernels must implement the following interface:

be callable. If you have a mean function, you should be able to invoke it on whatever it considers to be a single input (e.g. a Real)
colwise (a la Distances.jl) functionality.
pairwise (a la Distances.jl) functionality.

There will be fallbacks implemented for colwise and pairwise which just iterate over the data provided and require only that the mean function / kernel / cross-kernel is callable. It will be possible for any particular mean function / kernel / cross-kernel to override this functionality by implementing a custom version of colwise and / or pairwise. Indeed, it will often be preferable that a (cross-)kernel override the default pairwise implementation (e.g. EQ kernel).

mean, cov, and xcov will then simply have generic implementations in terms of the above. Additionally, it will be trivial to, for example, get only the diagonal / first row of a covariance matrix by the appropriate use of colwise.

Linear Combinations of Kernels

There are a variety of interesting optimisations that can be performed on kernels of the form

k(x, z) = w_1 * k_1(x, z) + w_2 * k_2(x, z) + ... + w_L k_L(x, z)

A naive recursive implementation in terms of the current Sum and Scaled kernels hides opportunities for parallelism in the computation of each term, and the summation over terms.

Notable examples of kernels with this form include:

The stationary spectral mixture kernel [1]
Non-stationary spectral mixture kernels and generalisations of [1]: [2] [3]

[1] - Wilson, Andrew, and Ryan Adams. "Gaussian process kernels for pattern discovery and extrapolation." International Conference on Machine Learning. 2013.
[2] - Remes, Sami, Markus Heinonen, and Samuel Kaski. "Non-stationary spectral kernels." Advances in Neural Information Processing Systems. 2017
[3] - Samo, Yves-Laurent Kom, and Stephen Roberts. "Generalized spectral kernels." arXiv preprint arXiv:1506.02236 (2015).

GPU Support

Ensuring that Stheno supports all of its functionality on the GPU (where appropriate) is quite important moving forwards. This will have to be CuArrays.jl-based. To achieve this, we must ensure that:

All base mean functions and kernels support GPU inputs
All operations on all compositions are implemented in terms of operations that CuArrays can support.
logpdf, rand, elbo, etc work correctly with CuArrays.
Zygote works efficiently on all of the above.

For the most part, I expect this functionality to require writing quite a number of tests, and making tweaks where necessary.

All the Single-Output Kernels

Although the collection of base Kernels in kernels.jl is significantly broader than it used to be, there's still quite a number of kernels that are listed in, for example, the GPML manual that aren't implemented here. A starting list of un-implemented kernels includes:

Please feel free to add any kernels that appear to be missing from this list and to open PRs that implement any of the above / anything that has been missed.

The Internals section of the docs provides an explanation of how to go about implementing new kernels within Stheno.

Stationary Finite Kernels

Currently assumed that a Finite Kernel is non-stationary. This isn't strictly the case, as Toeplitz covariance matrices are essentially the finite manifestation of a stationary kernel. This can be exploited for performance enhancements.

Convex Constrained GP

Is it possible to impose a convexity constraint on a Gaussian process? And is it possible to use Stheno to do this? I apologize if this is extremely off topic. I brand new to both GP's in general and Stheno.

Faster computation of matrix inverses and determinants with discrete x

@brad-ross and I are working on a paper that uses Stheno for GP related computation.

If y = f(x) + e, f(x)~GP(m(x), k(x, x)), and x (known as the "running variable" in our setting) is discrete with d unique values, then we think there are ways to speed up the computation of posterior, logpdf significantly. Below is a rough writeup that describes how this would be done for inverting the matrix required to compute the estimator: uses the matrix inversion lemma to change an O(n^3) operation to O(n + nd + d^3) . Similar derivation can be done to use determinant lemma to reduce the computation of the logpdf to depend on d instead of n.

@brad-ross is willing to make an attempt at implementing this, but also wanted see if there are any thoughts about this approach for speeding up computation (and suggestions again on code structure).

Flux integration + Neural Kernel Network

Relates to this issue.

A MWE an be found in the examples directory on this branch. It contains a basic demo of

how to compose a Flux model with a Stheno model (in the literal sense of composition)
a very basic, but working, NKN implementation. It's missing a load of stuff, but it gives us something to talk around.

The examples are all packaged nicely in a project, so it should be straightforward to get everything up and running.

The questions now are

have I covered all of the functionality of interest to GPFlux?
does this miss any important interface requirements?

@HamletWantToCode what do you think?

Sparse Approximations and Non-Gaussian Likelihoods

As a baseline, Stheno really needs to have [1] implemented. This will be very simple to achieve, and will likely only be a slight extension of the implementation of [2], the elbo for which can be found here.

It also seems sensible for Stheno to have at least a baseline implementation that can handle non-Gaussian likelihoods, even if it's not the fanciest most hot-off-the-press thing going. For example, [3] implemented with Monte Carlo sampling to estimate the intractable likelihood expectations would be a good place to start, or perhaps an implementation of [4], which also shouldn't be too hard.

The most important component of each of these is an elbo function. For models with Gaussian likelihoods, it's probably sufficient with a similar interface to the current interface:

elbo(f::FiniteGP, y::AV{<:Real}, u::FiniteGP)

but perhaps with some extra arguments for declare the size of the mini-batch and the number of Monte Carlo samples to use. A simple extension of the FiniteGP type would probably be the best bet for non-conjugate likelihoods - you would just need a collection of types that would specify what the likelihood you want to use is, and how it's parametrised in terms of the GP in question. Something quite and dirty here would be the best bet - it would be just be helpful to have the functionality, even if the interface isn't ideal.

For those methods that demand natural gradients, I imagine the interface will follow naturally once the elbo interface has been determined.

This issue shouldn't be a lot of work for someone who knows the literature below reasonably well.

[1] - Hensman, James, Nicolo Fusi, and Neil D. Lawrence. "Gaussian processes for big data." arXiv preprint arXiv:1309.6835 (2013).
[2] - Titsias, Michalis. "Variational learning of inducing variables in sparse Gaussian processes." Artificial Intelligence and Statistics. 2009.
[3] - Hensman, James, Alexander Matthews, and Zoubin Ghahramani. "Scalable variational Gaussian process classification." (2015).
[4] - Salimbeni, Hugh, Stefanos Eleftheriadis, and James Hensman. "Natural gradients in practice: Non-conjugate variational inference in Gaussian process models." arXiv preprint arXiv:1803.09151 (2018).

Zero and Constant optimisations

The current implementation cov for the Zero and Constant kernels returns a dense matrix. Clearly, this is sub-optimal. This is straightforwardly resolved by introducing specialised types for Zero and Constant matrices.

Periodic kernel optimisation

The generic implementation of the periodic kernel projects a vector of scalar inputs into two-dimensions according to the transform

t::AV->hcat(cos.((2π * f) .* t), sin.((2π * f) .* t))

This is sensible in the general case. However, there is at least one special case of which I am aware when we can optimise: the EQ kernel. In this case we should return a specialised PeriodicEQ kernel which exploits the simple algebraic form which can be shown to result from the application of this transform to inputs to an EQ kernel.

There are several things to do here:

Complete #5 as it blocks this issue.
Investigate whether the EQ is the only case in which things simplify, or whether there are a class of kernels for which similar optimisations can be applied.
Implement the specialised kernel and add the correct hooks for it to be used.

Warnings when plotting GPs

I am getting the following warning when plotting a GP using Plots and the 'samples' keyword :

"
Warning: Attribute alias width detected in the user recipe defined for the signature

(::Stheno.FiniteGP{Stheno.CompositeGP{Tuple{typeof(|),GP{Stheno.ZeroMean{Float64},Stheno.Scaled{Array{Float64,1},Stheno.Stretched{Array{Float64,1},EQ,typeof(identity)},typeof(identity)}},LinearAlgebra.Cholesky{Float64,Array{Float64,2}},Array{Float64,1},Stheno.FiniteGP{GP{Stheno.ZeroMean{Float64},Stheno.Scaled{Array{Float64,1},Stheno.Stretched{Array{Float64,1},EQ,typeof(identity)},typeof(identity)}},StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}},Array{Float64,1}}},StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}}).

To ensure expected behavior it is recommended to use the default attribute linewidth.
"

Uncertainty propagation (possible feature request)

Thanks for this excellent package. I am wondering whether it supports uncertainty propagation, that is, when the inputs to GP themselves have uncertainty (e..g, being Guassian distributed). I did not find relevant information in the documentation.

There are quite a few papers on this topic, like Agathe Girard's PhD thesis. This feature is particularly useful if we want to use GP for multi-step-ahead predictions, e.g., in a model predictive control context. The prediction at t+2 depends the previous prediction at t+1.

Numerical issues

Stheno is great and I'm enjoying it a lot so far.

In some cases I've found that it's easy to run into numerical issues when getting the logpdf, when the distance between x values is smaller than the length scale:

julia> using Stheno, LinearAlgebra

julia> f = GP(EQ(), GPC())
GP{Stheno.ZeroMean{Float64},EQ}(Stheno.ZeroMean{Float64}(), EQ(), 1, GPC(1))

julia> x = range(0, 1, length = 15)
0.0:0.02040816326530612:1.0

julia> cholesky(cov(f(x)))
ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.

AFAIU the covariance matrix should be positive definite by construction since the GP kernel is. It seems like the problem may be that in this case the eigenvalues are too small – not actually zero, but small enough to confuse the cholesky routine. (Does that sound right?)

Is there any more direct or numerically stable way to get the GP's logpdf that could avoid this issue?

Bayesian Optimization Example

Can you point me to an example, using Stheno, of Bayesian Optimization with an objective that considers exploration and exploitation?

cumulsizes not defined

I have add Stheno v0.3.2 in Julia 1.4.1 and typing using Stheno produces the following error:

julia> using Stheno
[ Info: Precompiling Stheno [8188c328-b5d6-583d-959b-9690869a5511]
ERROR: LoadError: LoadError: UndefVarError: cumulsizes not defined
Stacktrace:
 [1] include(::Module, ::String) at ./Base.jl:377
 [2] include(::String) at /Users/ferrari/.julia/packages/Stheno/m0I3t/src/Stheno.jl:1
 [3] top-level scope at /Users/ferrari/.julia/packages/Stheno/m0I3t/src/Stheno.jl:32
 [4] include(::Module, ::String) at ./Base.jl:377
 [5] top-level scope at none:2
 [6] eval at ./boot.jl:331 [inlined]
 [7] eval(::Expr) at ./client.jl:449
 [8] top-level scope at ./none:3
in expression starting at /Users/ferrari/.julia/packages/Stheno/m0I3t/src/util/block_arrays/dense.jl:4
in expression starting at /Users/ferrari/.julia/packages/Stheno/m0I3t/src/Stheno.jl:32
ERROR: Failed to precompile Stheno [8188c328-b5d6-583d-959b-9690869a5511] to /Users/ferrari/.julia/compiled/v1.4/Stheno/sVmvC_9B77L.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922

Concatenation for MeanFunctions, Kernels, and GPs

Defining what it means to concatenate the above types would greatly simplify the logpdf and rand functions for GPs. It would also enable the possibility of some really nice syntax.

Method ambiguity on 0.7

Running the first sample code block on 0.7 gives:

ERROR: MethodError: *(::LinearAlgebra.Adjoint{Float64,LinearAlgebra.UpperTriangular{Float64,BlockArrays.BlockArray{Float64,2,AbstractArray{Float64,2}}}}, ::BlockArrays.BlockArray{Float64,2,Array{Float64,2}}) is ambiguous. Candidates:
  *(A::AbstractArray{T,2}, B::BlockArrays.AbstractBlockArray{T,2}) where T in Stheno at C:\Users\abcd\.julia\dev\Stheno\src\util\block_arrays.jl:280
  *(adjA::LinearAlgebra.Adjoint{#s608,#s607} where #s607<:LinearAlgebra.AbstractTriangular where #s608, B::AbstractArray{T,2} where T) in LinearAlgebra at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v0.7\LinearAlgebra\src\triangular.jl:1813
Possible fix, define
  *(::LinearAlgebra.Adjoint{T,#s607} where #s607<:LinearAlgebra.AbstractTriangular, ::BlockArrays.AbstractBlockArray{T,2})
Stacktrace:
 [1] rand(::Random.MersenneTwister, ::BlockGP{Array{GP{FiniteMean,FiniteKernel},1}}, ::Int64) at C:\Users\abcd\.julia\dev\Stheno\src\gp\block_gp.jl:35
 [2] rand(::Random.MersenneTwister, ::Array{GP{FiniteMean,FiniteKernel},1}, ::Int64) at C:\Users\abcd\.julia\dev\Stheno\src\gp\block_gp.jl:46
 [3] rand(::Random.MersenneTwister, ::Array{GP{FiniteMean,FiniteKernel},1}) at C:\Users\abcd\.julia\dev\Stheno\src\gp\block_gp.jl:45

The error is triggered by the
I tried a hacky fix:

# block_arrays.jl, after line 280
*(A::LinAlg.Adjoint{T,M} where M<:LinAlg.AbstractTriangular, B::ABM{T}) where T = BlockMatrix([A]) * B

And that seemed to take care of the ambiguity but then I got the following on re-running the code sample:

ERROR: AssertionError: are_conformal(A, B)
Stacktrace:
 [1] *(::BlockArrays.BlockArray{Float64,2,Array{Float64,2}}, ::BlockArrays.BlockArray{Float64,2,Array{Float64,2}}) at C:\Users\abcd\.julia\dev\Stheno\src\util\block_arrays.jl:268
 [2] *(::LinearAlgebra.Adjoint{Float64,LinearAlgebra.UpperTriangular{Float64,BlockArrays.BlockArray{Float64,2,AbstractArray{Float64,2}}}}, ::BlockArrays.BlockArray{Float64,2,Array{Float64,2}}) at C:\Users\abcd\.julia\dev\Stheno\src\util\block_arrays.jl:281
 [3] rand(::Random.MersenneTwister, ::BlockGP{Array{GP{FiniteMean,FiniteKernel},1}}, ::Int64) at C:\Users\abcd\.julia\dev\Stheno\src\gp\block_gp.jl:35
 [4] rand(::Random.MersenneTwister, ::Array{GP{FiniteMean,FiniteKernel},1}, ::Int64) at C:\Users\abcd\.julia\dev\Stheno\src\gp\block_gp.jl:46
 [5] rand(::Random.MersenneTwister, ::Array{GP{FiniteMean,FiniteKernel},1}) at C:\Users\abcd\.julia\dev\Stheno\src\gp\block_gp.jl:45
 [6] top-level scope at none:0

It would be a good starting point to know which method specialisation we would like to go with here (block_array.jl or triangular.jl).

Length scale gradient NaN when x not fully continuous

If I get the gradient of the logpdf using Zygote, then the gradient of the length scale parameter is NaN if x has any values that are the same. This is related to #120, and might be fixed by the suggested implementation there. But wanted to note this in another issue to show that the issues around #120 are not purely computational. See code that generates this issue below. The first println(gradient) works fine, as per the tutorial, but once there is some x values that are repeated, then the gradient of length scale are NaN.

using Zygote: gradient
using Stheno
using Optim: optimize, BFGS, minimizer

function unpack(θ)
    σ² = exp(θ[1]) + 1e-6
    l = exp(θ[2]) + 1e-6
    σ²_n = exp(θ[3]) + 1e-6
    return σ², l, σ²_n
end

# x is continuous 
init = randn(3)
x = randn(100); y= randn(100)

function nlml(θ)
    σ², l, σ²_n = unpack(θ)
    k = σ² * stretch(Matern52(), l)
    f = GP(k, GPC())
    return -logpdf(f(x, σ²_n), y)
end
println(gradient(nlml, init)[1])

# some repeated values of x  
x = vcat(x, x); y = randn(200)
function nlml(θ)
    σ², l, σ²_n = unpack(θ)
    k = σ² * stretch(Matern52(), l)
    f = GP(k, GPC())
    return -logpdf(f(x, σ²_n), y)
end
println(gradient(nlml, init)[1])

Output of the first print statement:

julia> println(gradient(nlml, init)[1])
[-1.0713468383523934, -12.447387099066987, -131.494269354701]

Output of the second print statement:

println(gradient(nlml, init)[1])
[0.9055697178496014, NaN, -294.0797397890903]

I wonder if this is related to numerical instabilities in the cholesky decomposition when K(x,x) is not full rank? But overall, this seems like a bug. It can be addressed by adding some small jitter to the x values so that there is never a repeated x value in the dataset. But it seems like in many real world applications it is reasonable to expect some discrete x's?

Examples don't work

The example in the readme doesn't run, and neither do any of the model zoo examples. I guess things have changed since they were made, like eq() being EQ(), but it would be nice to have some working basic examples of how to actually use the library. Especially for someone like me who is new to Julia, it's very hard to see how things work and fit together.

Mistake in the logdet

There is a mistake here:
`
function logpdf(f::FiniteGP, y::AbstractVector{<:Real})

μ, C = mean(f), cholesky(Symmetric(cov(f)))

return -(length(y) * log(2π) + logdet(C) + Xt_invA_X(C, y - μ)) / 2

end

Logdet(C) should not be divided by two since it is already the Cholesky derivative.

Bijector not defined for FiniteGP

I found this when trying to run the Turing integration example. When I attempted to sample from the model (line 78), the error below is produced.

Not sure if this belongs here or in Bijectors...I can open an issue there if that's more appropriate.

MethodError: no method matching bijector(::Stheno.FiniteGP{GP{Stheno.ZeroMean{Float64},Stheno.Scaled{Array{Float64,1},Stheno.Stretched{Array{Float64,1},EQ,typeof(identity)},typeof(identity)}},ColVecs{Float64,Array{Float64,2}},LinearAlgebra.Diagonal{Float64,FillArrays.Fill{Float64,1,Tuple{Base.OneTo{Int64}}}}})
Closest candidates are:
  bijector(!Matched::KSOneSided) at C:\Users\sam.urmy\.julia\packages\Bijectors\bHaf6\src\transformed_distribution.jl:58
  bijector(!Matched::Product{Discrete,T,V} where V<:AbstractArray{T,1} where T<:Distribution{Univariate,Discrete}) at C:\Users\sam.urmy\.julia\packages\Bijectors\bHaf6\src\transformed_distribution.jl:39
  bijector(!Matched::Product{Continuous,T,Tdists} where Tdists<:(FillArrays.Fill{T,1,Axes} where Axes) where T<:Distribution{Univariate,Continuous}) at C:\Users\sam.urmy\.julia\packages\Bijectors\bHaf6\src\compat\distributionsad.jl:16
  ...
in top-level scope at untitled:78
in sample at Turing\GMBTf\src\inference\Inference.jl:153
in #sample#1 at Turing\GMBTf\src\inference\Inference.jl:153 
in sample at Turing\GMBTf\src\inference\Inference.jl:163 
in #sample#2 at Turing\GMBTf\src\inference\Inference.jl:163
in Sampler at Turing\GMBTf\src\inference\hmc.jl:376 
in DynamicPPL.Sampler at Turing\GMBTf\src\inference\hmc.jl:384
in Turing.Inference.HMCState at Turing\GMBTf\src\inference\hmc.jl:609
in #HMCState#58 at Turing\GMBTf\src\inference\hmc.jl:612
in link! at DynamicPPL\9OFG0\src\varinfo.jl:697 
in _link! at DynamicPPL\9OFG0\src\varinfo.jl:700
in macro expansion at DynamicPPL\9OFG0\src\varinfo.jl:709 
in link at Bijectors\bHaf6\src\Bijectors.jl:118

juliagaussianprocesses / stheno.jl Goto Github PK

stheno.jl's Introduction

Stheno

A Couple of Examples

Hyperparameter learning and inference

Non-Gaussian problems

GPs + Deep Learning

stheno.jl's People

Contributors

Stargazers

Watchers

Forkers

stheno.jl's Issues

Problem

Specification

Implementation

Problem

Specification

Implementation

Issues

Problem

Specification and Implementation

Recommend Projects

Recommend Topics

Recommend Org