johngoertz / gumbi Goto Github PK
View Code? Open in Web Editor NEWGaussian Process Model Building Interface
Home Page: https://JohnGoertz.github.io/Gumbi/
License: Apache License 2.0
Gaussian Process Model Building Interface
Home Page: https://JohnGoertz.github.io/Gumbi/
License: Apache License 2.0
Current pattern:
plt.sca(ax)
pp = gmb.ParrayPlotter(X, Y, z)
pp(plt.contourf, levels=20, cmap='pink', norm=norm)
pp.colorbar(ax=ax)
Desired pattern:
pp = gmb.ParrayPlotter(X, Y, z)
pp(plt.contourf, levels=20, cmap='pink', norm=norm, ax=ax)
pp.colorbar(ax=ax)
Hi @JohnGoertz
I began testing gumbi
on Windows 10 and Windows 10 WSL2. My configuration needs some work for Windows 10 it appears from a compiler point of view. On WSL2 (using Ubuntu 20.04 LTS with gcc 9.3.0
), the basic example (see the code below) runs without any errors.
gp.fit(outputs=['mpg'], continuous_dims=['horsepower'])
X = gp.prepare_grid()
y = gp.predict_grid()
gmb.ParrayPlotter(X, y).plot()
sns.scatterplot(data=cars, x='horsepower', y='mpg', color=sns.cubehelix_palette()[-1], alpha=0.5);```
Based on your suggestion from [https://discourse.pymc.io/t/introducing-gumbi-the-gaussian-process-model-building-interface/8377/5](url) I next tested the Multi-Output Regression example code with the Cars data set.
```gp.fit(outputs=['mpg', 'acceleration'], continuous_dims=['horsepower']);
X = gp.prepare_grid()
Y = gp.predict_grid()
axs = plt.subplots(2,1, figsize=(6, 8))[1]
for ax, output in zip(axs, gp.outputs):
y = Y.get(output)
gmb.ParrayPlotter(X, y).plot(ax=ax)
sns.scatterplot(data=cars, x='horsepower', y=output, color=sns.cubehelix_palette()[-1], alpha=0.5, ax=ax);```
I get the following **Assertion Error**. See the full trace back below:
AssertionError Traceback (most recent call last)
Input In [8], in <module>
----> 1 gp.fit(outputs=['mpg', 'acceleration'], continuous_dims=['horsepower'])
3 X = gp.prepare_grid()
4 Y = gp.predict_grid()
File ~/miniconda3/envs/gumbi_env/lib/python3.10/site-packages/gumbi/regression/GP_pymc3.py:292, in GP.fit(self, outputs, linear_dims, continuous_dims, continuous_levels, continuous_coords, categorical_dims, categorical_levels, additive, seed, heteroskedastic_inputs, heteroskedastic_outputs, sparse, n_u, **MAP_kwargs)
234 """Fits a GP surface
235
236 Parses inputs, compiles a Pymc3 model, then finds the MAP value for the hyperparameters. `{}_dims` arguments
(...)
284 self : :class:`GP`
285 """
287 self.specify_model(outputs=outputs, linear_dims=linear_dims, continuous_dims=continuous_dims,
288 continuous_levels=continuous_levels, continuous_coords=continuous_coords,
289 categorical_dims=categorical_dims, categorical_levels=categorical_levels,
290 additive=additive)
--> 292 self.build_model(seed=seed,
293 heteroskedastic_inputs=heteroskedastic_inputs,
294 heteroskedastic_outputs=heteroskedastic_outputs,
295 sparse=sparse, n_u=n_u)
297 self.find_MAP(**MAP_kwargs)
299 return self
File ~/miniconda3/envs/gumbi_env/lib/python3.10/site-packages/gumbi/regression/GP_pymc3.py:363, in GP.build_model(self, seed, continuous_kernel, heteroskedastic_inputs, heteroskedastic_outputs, sparse, n_u)
360 n_p = len(self.outputs)
362 D_in = len(self.dims)
--> 363 assert X.shape[1] == D_in
365 idx_l = [self.dims.index(dim) for dim in self.linear_dims]
366 idx_s = [self.dims.index(dim) for dim in self.continuous_dims]
AssertionError:
What am I doing wrong? Or is this related to any of the `numpy > 1.19.3` errors (does not look like that)?
Sree
Thanks a lot for creating Gumbi. I was playing around a bit with it and I hope it will evolve further!
What I was trying to do is actually explained here: https://discourse.pymc.io/t/use-exact-gaussian-process-model-from-gpytorch-as-emulator-in-pymc3/8680. Do you think that I can use Gumbi to do sth. similar as done in GPyTorch ( https://docs.gpytorch.ai/en/stable/examples/01_Exact_GPs/Simple_GP_Regression.html) ?
If yes, is there a simple method to export the fitted gumbi model in order that the gumbi.predict can be used inside of "pure" pymc3 again (aesara compatible that it can run with the NUTs sampler) ?
Apart from that, I have another question: I was able to fit a GP of my data with Gumbi, however I could not really check its performance? Do you have an example code where you use the cross_validate
method? I did not really manage to get it working:
I did:
gp.cross_validate(['melt_f', 'prcp_fac', 'temp_bias'], n_train = 200)
-> but then I got a TypeError: init() missing 1 required positional argument: 'outputs'. When I do gp.outputs
, however, I get the right output name ?!
Thanks a lot in advance!
The behavior of the cross_validation
method may be confusing - a notebook or expanded Examples in the docstring would help.
Current pattern:
gmb.Standardizer(y={'μ':y_train.mean(), 'σ':y_train.std()})
Desired pattern:
gmb.Standardizer(y=y_train)
Right now Gumbi only allows (pymc) marginalized posterior predictions, i.e. only mean and variance rather than individual samples. We should also implement an interface for drawing individual posterior samples via .conditional
.
Gumbi exposes the Pymc API, so for now the user can access the underlying pymc objects to do this:
gp = gmb.GP(...)
gp.fit(...)
gp.prepare_grid()
# add the GP conditional to the model, given the new X values
with model:
f_pred = gp.conditional("f_pred", gp.grid_points)
# To use the MAP values, you can just replace the trace with a length-1 list with `mp`
with model:
pred_samples = pm.sample_posterior_predictive([gp.MAP], vars=[f_pred], samples=2000)
But this approach obviously introduces complexity that Gumbi was intended to remove. In particular:
f_pred
and then drawing samples should be reduced to a single command, maybe gp.draw(samples=2000, point='MAP')
f_pred
should be declared as a pm.Data
object so that its value can be updated repeatedly, similar to the suggestion here. This should probably be done pre-emptively during intial model building.pred_samples
should be reshaped and stored as a Parray
similar to how predict
behaves. This will be slightly complicated by the fact that pred_samples
will have an additional dimension compared to gp.grid_points
corresponding to different samples.
ParrayPlotter
should potentially be updated to accomodate this, otherwise the user might need to create a new ParrayPlotter
instance for each sample.The core functionality of Gumbi is contained in the Abstract Base Class Regressor
. This was written to allow simple extension with custom models and inferrence methods through defining the abstract methods fit
, build_model
, and predict
. There should be a notebook demonstrating how this could be achieved, potentially with a Bayesian Neural Network or a Generalized Linear Model.
Current pattern:
gmb.uparray('y', μ=y_out.mean(axis=1), σ2=y_out.var(axis=1), stdzr=stdzr)
Desired pattern:
gmb.uparray(y=y_out, axis=1, stdzr=stdzr)
Command:
#!/bin/bash -eo pipefail
python docs/source/generate_api_rst.py
Error:
Traceback (most recent call last):
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/configparser.py", line 238, in fetch_val_for_key
return self._theano_cfg.get(section, option)
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/configparser.py", line 781, in get
d = self._unify_values(section, vars)
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/configparser.py", line 1152, in _unify_values
raise NoSectionError(section) from None
configparser.NoSectionError: No section: 'blas'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/configparser.py", line 354, in __get__
val_str = cls.fetch_val_for_key(self.name, delete_key=delete_key)
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/configparser.py", line 242, in fetch_val_for_key
raise KeyError(key)
KeyError: 'blas__ldflags'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/circleci/project/docs/source/generate_api_rst.py", line 9, in <module>
import gumbi
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/gumbi/__init__.py", line 5, in <module>
from .regression import *
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/gumbi/regression/__init__.py", line 1, in <module>
from .GP_pymc3 import GP
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/gumbi/regression/GP_pymc3.py", line 6, in <module>
import pymc3 as pm
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/pymc3/__init__.py", line 23, in <module>
import theano
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/__init__.py", line 83, in <module>
from theano import scalar, tensor
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/tensor/__init__.py", line 20, in <module>
from theano.tensor import nnet # used for softmax, sigmoid, etc.
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/tensor/nnet/__init__.py", line 3, in <module>
from . import opt
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/tensor/nnet/opt.py", line 32, in <module>
from theano.tensor.nnet.conv import ConvOp, conv2d
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/tensor/nnet/conv.py", line 20, in <module>
from theano.tensor import blas
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/tensor/blas.py", line 163, in <module>
from theano.tensor.blas_headers import blas_header_text, blas_header_version
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/tensor/blas_headers.py", line 1016, in <module>
if not config.blas__ldflags:
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/configparser.py", line 358, in __get__
val_str = self.default()
File "/home/circleci/.pyenv/versions/3.9.10/lib/python3.9/site-packages/theano/link/c/cmodule.py", line 2621, in default_blas_ldflags
blas_info = numpy.distutils.__config__.blas_opt_info
AttributeError: module 'numpy.distutils.__config__' has no attribute 'blas_opt_info'
Exited with code exit status 1
CircleCI received exit code 1
Sometimes it's simpler to ignore the "standardizer" functionality of *Parrays and treat the variable(s) as zero-mean and unit-variance.
Current pattern:
gmb.uparray('y', μ=y_out.mean(axis=1), σ2=y_out.var(axis=1), stdzr=stdzr)
Desired pattern:
gmb.uparray('y', μ=y_out.mean(axis=1), σ2=y_out.var(axis=1)) # Internally creates a default Standardizer() instance
Given a list of UPArrays, y_upas
, we can find total expectation/variance as:
μs = np.stack([y.μ for y in y_upas])
σ2s = np.stack([y.σ2 for y in y_upas])
total_upa = gmb.uparray('y',
μ = μs.mean(0),
σ2 = μs.var(0) + σ2s.mean(0),
stdzr=stdzr
)
Implement as something like
gmb.uparray.total(y_upas)
Where name and stdzr
are inferred from, e.g., the first upa in the list.
Gumbi should be able to provide a (Pymc) Latent
GP implementation that enables regression with non-Normal likelihoods. This would allow use cases such as these Pymc examples with StudentT
likelihood for regression and Bernoulli
likelihood for classification.
The best way will probably be to have a function that returns the gp.prior
object, and the user can then tack on their desired likelihood and any additional variables. The code pattern will probably look like this, but speak up, users, if you have opinions!
gp = gmb.GP(...)
gp.build_model(..., Latent=True)
with gp.model:
f = gp.prior
# logit link and Bernoulli likelihood
p = pm.Deterministic("p", pm.math.invlogit(f))
y_ = pm.Bernoulli("y", p=p, observed=y)
gp.sample(...)
Prediction will obviously depend on #13.
I'm posting this as a FYI for other users who may try to run the Multi Output Regression example syntax shared here:
[https://johngoertz.github.io/Gumbi/notebooks/examples/Cars_Dataset.html#Correlated-multi-input-regression-accross-different-classes-in-a-category](Multi Output Regression)
The posted syntax is
gp.fit(outputs=['mpg', 'acceleration'], continuous_dims=['horsepower']);
X = gp.prepare_grid()
Y = gp.predict_grid()
axs = plt.subplots(2,1, figsize=(6, 8))[1]
for ax, output in zip(axs, gp.outputs):
y = Y.get(output)
gmb.ParrayPlotter(X, y).plot(ax=ax)
sns.scatterplot(data=cars, x='horsepower', y=output, color=sns.cubehelix_palette()[-1], alpha=0.5, ax=ax);
The error is in axs = plt.subplots(2,1, figsize=(6, 8))[1]
The correction is axs = plt.pyplot.subplots(2,1, figsize=(6, 8))[1]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.