Code Monkey home page Code Monkey logo

blitz-bayesian-deep-learning's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blitz-bayesian-deep-learning's Issues

Implementing Minibatches / KL re-weighting

I was taking a look at the examples you provided for training a BBP model and see that you're training using a batch size of 16 in the example below:

ds_train = torch.utils.data.TensorDataset(X_train, y_train)
dataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)

In the sample_elbo function you include the ability to implement a complexity_cost_weight but at the moment this does not seem to be utilised. In section 3.4 of the Weight Uncertainty in Neural Networks paper, the authors suggest a method to calculate the minibatch-weight:

This shouldn't be too hard to implement and would help ensure that the loss is calculated correctly.

Bayesian Siamese Network

Hi! I'm trying to create a Siamese Network using Bayesian Layers. But I'm having the following issue:

/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:57: UserWarning: Traceback of forward call that caused the error:
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/traitlets/config/application.py", line 664, in launch_instance
    app.start()
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 583, in start
    self.io_loop.start()
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 149, in start
    self.asyncio_loop.run_forever()
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/asyncio/base_events.py", line 442, in run_forever
    self._run_once()
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/asyncio/base_events.py", line 1462, in _run_once
    handle._run()
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/ioloop.py", line 690, in <lambda>
    lambda f: self._run_callback(functools.partial(callback, future))
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/gen.py", line 787, in inner
    self.run()
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/gen.py", line 748, in run
    yielded = self.gen.send(value)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 361, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 541, in execute_request
    user_expressions, allow_stdin,
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 300, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2858, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2886, in _run_cell
    return runner(coro)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3063, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3254, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-78-4c5bd945864c>", line 12, in <module>
    pred = sn(data1.to(torch.int64), data2.to(torch.int64))
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "<ipython-input-72-7e84bf9dfb73>", line 24, in forward
    input_1 = self.e3(input_1)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/blitz/modules/linear_bayesian_layer.py", line 72, in forward
    b = self.bias_sampler.sample()
  File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/blitz/modules/weight_sampler.py", line 33, in sample
    self.w = self.mu + self.sigma * self.eps_w

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-78-4c5bd945864c> in <module>
     17 
     18         # Backpropagation
---> 19         loss.backward()
     20 
     21         optimizer.step()

~/anaconda3/envs/torch-ds/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    164                 products. Defaults to ``False``.
    165         """
--> 166         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    167 
    168     def register_hook(self, hook):

~/anaconda3/envs/torch-ds/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     97     Variable._execution_engine.run_backward(
     98         tensors, grad_tensors, retain_graph, create_graph,
---> 99         allow_unreachable=True)  # allow_unreachable flag
    100 
    101 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [8]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

That's my code:

@variational_estimator
class SiameseNet(nn.Module):
    
    def __init__(self, maximo, vocab1, vocab2):
        super(SiameseNet, self).__init__()
        
        self.embedding1= nn.Embedding(len(vocab1.keys()),maximo)
        self.gru1= BayesianLSTM(maximo, maximo)
        self.e1 = BayesianLinear(maximo,16)
        self.e2 = BayesianLinear(16, 8)
        self.e3 = BayesianLinear(8, 8)
        
    def euclidean_distance(self,input_1, input_2):
        input_1, input_2 = input_1[:, -1, :], input_2[:, -1, :]
        dist = ((input_1-input_2)**2).sum(dim=1)
        return dist
        
    def forward(self,input_1, input_2):
        input_1 = self.embedding1(input_1)
        input_1, hidden1= self.gru1(input_1)
        input_1 = self.e1(input_1)
        input_1 = self.e2(input_1)
        input_1 = self.e3(input_1)
        
        input_2 = self.embedding1(input_2)
        input_2, hidden2= self.gru1(input_2)
        input_2 = self.e1(input_2)
        input_2 = self.e2(input_2)
        input_2 = self.e3(input_2)
        
        output = self.euclidean_distance(input_1, input_2)
        return torch.sigmoid(output)

Monte Carlo Sampling

In the code below you set the loss to be equal to criterion(outputs, labels) on each iteration of the loop - as such this I don't believe this is increasing the value of the loss for each sample?

loss = 0
for _ in range(sample_nbr):
outputs = self(inputs)
loss = criterion(outputs, labels)
loss += self.nn_kl_divergence() * complexity_cost_weight
return loss / sample_nbr

In addition, you normalise the entire loss at the end, might it be more appropriate to normalise just the sampled losses?

Finally, in the paper, the authors subtract the data-dependant part from the KL Divergence whereas in your code you seem to summate them, would it be correct to change the sign here - or would it depend entirely on the criterion defined?

The changes I'm proposing are summarised in the code below:

loss = 0 
for _ in range(sample_nbr): 
    outputs = self(inputs)
    loss += self.nn_kl_divergence() * complexity_cost_weight 
loss /= sample_nbr
return loss - criterion(outputs, labels)

ValueError: The value argument must be within the support

I found a weird bug, I was able to reproduce it with the bayesian_LeNet_mnist example by reducing the number of parameters.

I have no idea why but based on my tests it occurs randomly during the training (though with the same seed it always trigger at the same iteration). And I was able to reproduce it only with small networks, if I add more neurons or more layers the problem never happen.

Error traceback

Traceback (most recent call last):
  File "networks/test.py", line 66, in <module>
    main()
  File "networks/test.py", line 40, in main
    loss = classifier.sample_elbo(inputs=datapoints.to(device),
  File "venv/lib/python3.8/site-packages/blitz/utils/variational_estimator.py", line 65, in sample_elbo
    outputs = self(inputs)
  File "venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "networks/test.py", line 28, in forward
    out = self.fc3(out)
  File "venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "venv/lib/python3.8/site-packages/blitz/modules/linear_bayesian_layer.py", line 93, in forward
    self.log_prior = self.weight_prior_dist.log_prior(w) + b_log_prior
  File "venv/lib/python3.8/site-packages/blitz/modules/weight_sampler.py", line 84, in log_prior
    prob_n1 = torch.exp(self.dist1.log_prob(w))
  File "venv/lib/python3.8/site-packages/torch/distributions/normal.py", line 73, in log_prob
    self._validate_sample(value)
  File "venv/lib/python3.8/site-packages/torch/distributions/distribution.py", line 277, in _validate_sample
    raise ValueError('The value argument must be within the support')
ValueError: The value argument must be within the support

Code

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from blitz.modules import BayesianConv2d, BayesianLinear
from blitz.utils import variational_estimator


def main():
    train_dataset = dsets.MNIST(root="./cache", train=True, transform=transforms.ToTensor(), download=True)
    train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)

    test_dataset = dsets.MNIST(root="./cache", train=False, transform=transforms.ToTensor(), download=True)
    test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=True)

    @variational_estimator
    class BayesianCNN(nn.Module):
        def __init__(self):
            super().__init__()
            self.fc2 = BayesianLinear(784, 10)
            self.fc3 = BayesianLinear(10, 10)

        def forward(self, x):
            out = x.view(x.size(0), -1)
            out = F.relu(self.fc2(out))
            out = self.fc3(out)
            return out

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    classifier = BayesianCNN().to(device)
    optimizer = optim.Adam(classifier.parameters(), lr=0.001)
    criterion = torch.nn.CrossEntropyLoss()

    iteration = 0
    for epoch in range(100):
        for i, (datapoints, labels) in enumerate(train_loader):
            optimizer.zero_grad()
            loss = classifier.sample_elbo(inputs=datapoints.to(device),
                                          labels=labels.to(device),
                                          criterion=criterion,
                                          sample_nbr=3,
                                          complexity_cost_weight=1 / 50000)
            # print(loss)
            loss.backward()
            optimizer.step()

            iteration += 1
            if iteration % 250 == 0:
                print(loss)
                correct = 0
                total = 0
                with torch.no_grad():
                    for data in test_loader:
                        images, labels = data
                        outputs = classifier(images.to(device))
                        _, predicted = torch.max(outputs.data, 1)
                        total += labels.size(0)
                        correct += (predicted == labels.to(device)).sum().item()
                print('Iteration: {} | Accuracy of the network on the 10000 test images: {} %'
                      .format(str(iteration), str(100 * correct / total)))


if __name__ == '__main__':
    main()

Providing a minimal working example

I really like the idea of this library. However, it is really hard to get started with it.
For example, the minimal working example in the read me doesn't work. No matter what I try, I cannot get it to improve the accuracy. The loss decrease towards zero, but the accuracy doesn't change. The problem is that it is totally unclear why this is the case is it because:

  • there is no non-linear activation function in the model?
  • Is the model too big for BNNs?
  • or is something else the problem?

Any help would be appreciated.

A question in stocks price prediction

Hi, Blitz is very useful, but I have a question.
When I reduced the size of data, the result of prediction became so bad.
Change the parameters prior_sigma_1, prior_sigma_2 and posterior_rho_init seems can improve the result of prediction.

Therefore, can you give me some advice that how to initialize these parameters?
Best regard!
Figure_1

Figure_2

ConvTranspose2d

Hey, I may make a PR for adding the convtranspose soon, was just wondering if it was a planned feature in the near-term, thanks!

Question about calculating uncertainty for lstm Seq2Seq architecture

Hi. Thank you for the easy to use library. I am using the library for my project where I try to create a encoder decoder network using lstm. The last layer of the model is a Linear layer. It works as a multi-class classification problem. I am trying to replace the lstm and the last linear layer with the Bayesian alternative. I am very new to BNNs. I don't quite understand how the uncertainty can be measured in this case so that the classifier doesn't predict anything if the uncertainty is high. Any help will be appreciated. Thank you

loss in regression task

The sample_elbo function is constructed as follow:

‘’‘
The ELBO Loss consists of the sum of the KL Divergence of the model
(explained above, interpreted as a "complexity part" of the loss)
with the actual criterion - (loss function) of optimization of our model
(the performance part of the loss).
’‘’

But most others calculate ELBO = - log_Likelihood + KL.
I don’t think “the actual criterion” and “ - log_Likelihood” are equivalent。

Specifically,I want to know why the loss in the regression task includes mse as follow:

def sample_elbo(self, inputs, labels, criterion, sample_nbr, complexity_cost_weight=1):
loss = 0
for _ in range(sample_nbr):
outputs = self(inputs)
loss += criterion(outputs, labels)
loss += self.nn_kl_divergence() * complexity_cost_weight
return loss / sample_nbr

I think the log_gaussian should be used since the lose = - log_Likelihood + KL.
When the task is classification,the ”log_Likelihood“ can be calculated with “torch.nn.CrossEntropyLoss()”.
But the ”torch.nn.MSELoss()“ can't give the ”log_Likelihood“.

Predictive Posterior with 0 Entropy

Hi there!

I have been building predictive posterior distributions for my predictive models after the application of a softmax label for the following classifier:

(classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): BayesianLinear(
      (weight_sampler): TrainableRandomDistribution()
      (bias_sampler): TrainableRandomDistribution()
      (weight_prior_dist): PriorWeightDistribution()
      (bias_prior_dist): PriorWeightDistribution()
    )
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): BayesianLinear(
      (weight_sampler): TrainableRandomDistribution()
      (bias_sampler): TrainableRandomDistribution()
      (weight_prior_dist): PriorWeightDistribution()
      (bias_prior_dist): PriorWeightDistribution()
    )
    (5): ReLU(inplace=True)
    (6): BayesianLinear(
      (weight_sampler): TrainableRandomDistribution()
      (bias_sampler): TrainableRandomDistribution()
      (weight_prior_dist): PriorWeightDistribution()
      (bias_prior_dist): PriorWeightDistribution()
    )
  )

For some reason, my posterior after the application of a softmax looks like the following:

array([[[0., 0., 0., 0., 1., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        ...,
        [1., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1.]], ....

With the following dimensionality: (50, 60, 6) <- [n_posterior, n_samples, n_classes]

I'm not sure why I get all ones and 0s, can you help explain why this may be the case? After averaging across the posterior predictive distribution, I am returned something more useful:

array([[0.5 , 0.  , 0.24, 0.02, 0.22, 0.02],
       [0.22, 0.  , 0.58, 0.  , 0.18, 0.02],
       [0.32, 0.02, 0.22, 0.04, 0.18, 0.22],
       [0.44, 0.  , 0.38, 0.  , 0.12, 0.06],
       [0.48, 0.  , 0.4 , 0.  , 0.1 , 0.02],

However, while the averaged predictive posterior can be analyzed for entropy, I'm afraid the entropy of the original predictive posterior before averaging is meaningless. Can you help me understand perhaps what can be changed to prohibit predictions of 0 and 1 and why the softmax may be yielding this?

From what I can tell from the latent distribution of data and the fact that predictive posterior label assignment fluctuates heavily, predictions seem highly uncertain and should be reflected by the original predictive posterior assignment. I think the reason for this is that when predicting using this framework, I have a pre-softmax distribution that contains multiple large numbers (see row 1):

array([[[ 3.94745656e+05, -1.36722781e+05,  2.51609000e+05,
         -2.27076387e+04,  1.94971672e+05, -1.56166104e+04],
        [ 5.99162695e+04, -1.85560625e+05,  2.40618594e+05,
          1.35033016e+05,  6.93285859e+04, -2.45788859e+05],
        [ 1.50803141e+05, -6.68518984e+04,  8.18610234e+04,
         -1.84584469e+05,  1.43614078e+05,  1.04057883e+05],

I really do not think I should be getting this kind of distribution for the posterior, though perhaps could reflect my setup or maybe some instabilities with training these types of networks. Curious to hear your thoughts?

when can the modules be applied in pytorch1.7?

I want do something about bayesian deep learning research based on the pytorch 1.7, but i find that the modules only depends on pytorch 1.4 when I use pip to install it, will you update the modules to depend on pytorch 1.7? if you can, I will be very happy~

Problem on data type (.float()/.double())

I found that if you use .double() to change the type of both model and data, the printed parameters will keep unchanged (although the model seems to be updated). Are there any explanations?

Bayesian representation of non-linear layers

Are there any plans to expand the set of Bayesian implementations for non-linear layers (e.g. sigmoid, tanh)? Or they already exist, but maybe I'm just failing to find them. I am more than glad to provide some time/effort towards the inclusion of this feature.

Cheers,
Jose
PS: Thanks for this library, I've tested 5+ different existing solutions, and Blitz is perfect for my goals of rapid prototyping and testing.

Bayesian MLP not learning for regression tasks

Hi (and thanks a bunch for this framework!),

I'm testing out a Bayesian neural net for a simple regression task. However, after a lot of training, when I test the output, I just get an (almost) constant output. I follow the same workflow as in the Boston Housing example, except I use a function to generate my dataset.

Here's my code, if you're interested:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.tensor as Tensor
import numpy as np
import matplotlib.pyplot as plt

from blitz.modules import BayesianLinear
from blitz.utils import variational_estimator

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X = np.expand_dims(np.random.uniform(0,10,1000), -1)    # these two lines are the only thing I changed as far
y = np.sin(X)                                                                            # as data preprocessing is concerned

X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=.25,
                                                    random_state=42)
X_train, y_train = torch.tensor(X_train).float(), torch.tensor(y_train).float()
X_test, y_test = torch.tensor(X_test).float(), torch.tensor(y_test).float()

ds_train = torch.utils.data.TensorDataset(X_train, y_train)
dataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)

ds_test = torch.utils.data.TensorDataset(X_test, y_test)
dataloader_test = torch.utils.data.DataLoader(ds_test, batch_size=16, shuffle=True)

@variational_estimator
class BayesianRegressor(nn.Module):
    def __init__(self):
        super().__init__()
        #self.linear = nn.Linear(input_dim, output_dim)
        self.blinear1 = BayesianLinear(1, 100)
        self.blinear2 = BayesianLinear(100, 100)
        self.blinear3 = BayesianLinear(100, 1)
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x_):
        x = self.sigmoid(self.blinear1(x_))
        x = self.sigmoid(self.blinear2(x))
        x = self.blinear3(x)
        return x

regressor = BayesianRegressor().to(device)
criterion = torch.nn.MSELoss()
optimizer = optim.Adam(regressor.parameters(), lr=0.005)

iteration = 0
hist = []
for epoch in range(1000):
    totalloss = 0
    u = 0
    for i, (datapoints, labels) in enumerate(dataloader_train):
        u += 1
        optimizer.zero_grad()
        
        loss = regressor.sample_elbo(inputs=datapoints.to(device),
                           labels=labels.to(device),
                           criterion=criterion,
                           sample_nbr=3,
                           complexity_cost_weight=1/X_train.shape[0])
        totalloss += loss.item()
        loss.backward()
        optimizer.step()
    hist.append(totalloss/u)
    print(f"[Epoch {epoch}] "+"Loss: {:.4f}".format(totalloss/u))

Then I generate the outputs:

plt.scatter(X,y)
plt.scatter(X,regressor(Tensor(X).float().to(device)).detach().cpu(),s=5)
plt.show()

which gives me: this

Is there something I'm doing wrong here?

pip install

So I'm not quite sure...but the requirements.txt include some specific version of torch which is not available currently. So every time i try to install it i get the error that the torch package is not avaiable. I think your programm is compatible with every version of
pytorch > 1.3.1 till the last stable one 1.51. So i will write the requirements.txt a bit different:

torch>=1.3.1

Initializing net parameters

Hi,
is there a possibility to initialize the mu´s of a net that is using blitz layers with the parameters of its deterministic equivalent? To get a better understanding I build a net with only one weight and one bias, thus the weight can be interpreted as the slope and the bias as the intercept of a linear function. However, setting the mu of the bias and weight seems to only work for a frozen model.
Setting blitz net mu´s

import torch
import torch.nn as nn
from blitz.modules import BayesianLinear
from blitz.utils import variational_estimator
import matplotlib.pyplot as plt
import numpy as np

@variational_estimator
class bayesian_Net(nn.Module):

    def __init__(self, freeze):
        super(bayesian_Net, self).__init__()
        self.bl1 = BayesianLinear(1, 1, bias=True, freeze=freeze)
        
    def forward(self, x):
        x = self.bl1(x)
        return x

unfrozen_net = bayesian_Net(freeze=False)
frozen_net = bayesian_Net(freeze=True)

slope = -0.5
intercept = 1

unfrozen_net.bl1.weight_mu = torch.nn.Parameter(torch.Tensor([[slope]]), requires_grad=True)
unfrozen_net.bl1.bias_mu = torch.nn.Parameter(torch.Tensor([intercept]), requires_grad=True)

frozen_net.bl1.weight_mu = torch.nn.Parameter(torch.Tensor([[slope]]), requires_grad=True)
frozen_net.bl1.bias_mu = torch.nn.Parameter(torch.Tensor([intercept]), requires_grad=True)

x = np.zeros((10, 1))
for k in range(1, 10):
    x[k] = k
x = torch.FloatTensor(x)
#plot
for k in range(100):
    plt.plot(x, unfrozen_net(x).detach().numpy(), 'r')
    plt.plot(x, frozen_net(x).detach().numpy(), 'k')

Uncertainties on input data

Hello,

first off thank you for sharing this project!

I am looking for a way to include uncertainties in my input data. Basically, I want to do a regression on experimental data, where some of the data is more accurately determined, and therefore I want it to have a larger "weight".

I looked through the examples and could not find a way to do this.

First, I want to confirm if what I want is possible.
Secondly, it would be most appreciated if anyone could point me in the right direction.

Thank you for your time reading this :)

Blitz Bayesian model wrapped in DataParallel not using multiple GPUs

I have been trying to use Bayesian linear regression example given by Blitz authors and parallelize their model by wrapping it with torch.nn.DataParallel. However, it seems that the given code is only using one gpu and not multiple gpus. Below is the same code from the bayesian_regression_boston.py example with model wrapped in DataParallel.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

from blitz.modules import BayesianLinear
from blitz.utils import variational_estimator

from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X, y = load_boston(return_X_y=True)
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(np.expand_dims(y, -1))

X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=.25,
                                                    random_state=42)


X_train, y_train = torch.tensor(X_train).float(), torch.tensor(y_train).float()
X_test, y_test = torch.tensor(X_test).float(), torch.tensor(y_test).float()


@variational_estimator
class BayesianRegressor(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        #self.linear = nn.Linear(input_dim, output_dim)
        self.blinear1 = BayesianLinear(input_dim, 512)
        self.blinear2 = BayesianLinear(512, output_dim)
        
    def forward(self, x):
        print("\tIn Model: input size", x.size())
        x_ = self.blinear1(x)
        x_ = F.relu(x_)
        return self.blinear2(x_)


def evaluate_regression(regressor,
                        X,
                        y,
                        samples = 100,
                        std_multiplier = 2):
    preds = [regressor(X) for i in range(samples)]
    preds = torch.stack(preds)
    means = preds.mean(axis=0)
    stds = preds.std(axis=0)
    ci_upper = means + (std_multiplier * stds)
    ci_lower = means - (std_multiplier * stds)
    ic_acc = (ci_lower <= y) * (ci_upper >= y)
    ic_acc = ic_acc.float().mean()
    return ic_acc, (ci_upper >= y).float().mean(), (ci_lower <= y).float().mean()

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
regressor = BayesianRegressor(13, 1).to(device)
optimizer = optim.Adam(regressor.parameters(), lr=0.01)
criterion = torch.nn.MSELoss()

if torch.cuda.device_count() > 1:
  print("Let's use", torch.cuda.device_count(), "GPUs!")
  regressor = nn.DataParallel(regressor)
regressor.to(device)

ds_train = torch.utils.data.TensorDataset(X_train, y_train)
dataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)

ds_test = torch.utils.data.TensorDataset(X_test, y_test)
dataloader_test = torch.utils.data.DataLoader(ds_test, batch_size=16, shuffle=True)


iteration = 0
for epoch in range(100):
    for i, (datapoints, labels) in enumerate(dataloader_train):
        optimizer.zero_grad()
        
        print("Outside: input size", datapoints.size())
        loss = regressor.module.sample_elbo(inputs=datapoints.to(device),
                           labels=labels.to(device),
                           criterion=criterion,
                           sample_nbr=1,
                           complexity_cost_weight=1/X_train.shape[0])
        loss.backward()
        optimizer.step()
        
        iteration += 1
        if iteration%100==0:
            ic_acc, under_ci_upper, over_ci_lower = evaluate_regression(regressor,
                                                                        X_test.to(device),
                                                                        y_test.to(device),
                                                                        samples=25,
                                                                        std_multiplier=3)
            
            print("CI acc: {:.2f}, CI upper acc: {:.2f}, CI lower acc: {:.2f}".format(ic_acc, under_ci_upper, over_ci_lower))
            print("Loss: {:.4f}".format(loss))

Below I provide the portion of the output. I print out the input dimension before calling the loss.module.sample_elbo and that should have the batch_size x no_variables (which is correctly printed out). However, inside the model's forward map it should print out the dimensions of each smaller batch taken by each of the GPU so it should have printed out 8 lines as 'In Model: input size torch.Size([2, 13])'. But apparently, it is only putting data on one GPU.

Could you please let me know what needs to be done for this to work on multiple GPUs? FYI: I use the methods suggested here to check whether multiple GPUs are actually being used for processing the input batch.

Let's use 8 GPUs!
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
        In Model: input size torch.Size([16, 13])

Import error

Hi,

ImportError: cannot import name 'GaussianVariational' from 'blitz.modules.weight_sampler' (/opt/conda/lib/python3.7/site-packages/blitz/modules/weight_sampler.py)

Using BNN to Make a Prediction

Hi, I have a question about using Blitz to make a prediction. Take the Boston dataste as an example, I can get the loss and accuracy for the model and I'm just wondering what can I do to make a prediction i.e I'm going to load the X_train again into the trained model to compare the prediction values to the true values.
Forgive my limit knowledge of neural networks since I had neber done anything in this field. : )
Best Regards

Bayesian UNet

Do you have any examples we could use for a Bayesian UNet (3D)? I'm really interested in using blitz for some research, so this would be helpful!

freeze_, unfreeze_ vs model.eval()/model.train()

Thanks for an excellent and well thought out framework.

It looks like from a model evaluation perspective - freeze/unfreeze have no role to play . To predict on a held out set it is enough to set torch.no_grad() and model.eval()

After all, all trainable parameters including those of the posterior distributions rho and mu are not affected by freeze/unfreeze (unless we call model.eval())
Is that correct?

Variational Gaussian Log Likelihood

I was taking a look at the way that the log likelihood is calculated for the variational Gaussian distribution and the second term you use self.sigma as opposed to torch.log(self.sigma):

log_posteriors = -log_sqrt2pi - self.sigma - (((self.w - self.mu) ** 2)/(2 * self.sigma ** 2))

As sigma is calculated by: sigma = log(1 + exp(epsilon))
I'm not entirely sure whether another log function needs to be applied here or not - what do you think?

classification with freeze_ and unfreeze_

Hi,
First of all, I very much appreciate your wonderful work!
Currently, I am testing CIFAR10 classification task with your example code.
I would like to ask whether 10% of the test performance is an usual case when I call freeze_() in the test time.

# Test Time
    with torch.no_grad():
        for data in test_loader:
            images, labels = data
            classifier.freeze_()
            outputs = classifier(images.to(device))
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels.to(device)).sum().item()
    print(f'Freeze Epoch {epoch} | {str(100 * correct / total)}% | Elpased: {time.time() - tic:.1f}s')
    classifier.unfreeze_()

I only add classifier.freeze_() before getting outputs.
I thought the accuracy should be similar with unfreeeze_(), however, it seems not.
When I activated freezemode, I got 10%, but unfreeze mode reach 45% at the first epoch.
Since I only activate it at the test time, the training loss keeps going down.

Best Regards,
YJ

Do you have cuda support?

Hi!
If I run model on GPU, it's cause
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm

Do you have plans to add cuda support?

Why my loss becomes NAN

I use BLSTM to make time-series prediction, but the loss becomes NAN after about 40 epoches. Standard LSTM can train my dataset well. Could you know any possible reason? Thank you so much

3D and 1DConv

Hi,

If I copy the Bayesian2DConv class and I replace the 2d conv with 3d or 1d conv, then will it function as an appropriate baysian 3D or 1d convolutional layer? At first glance it seems to be working.

Thanks,
Daniel

How to enable posterior sharpening?

Hi there, thank you for your amazing implementation. When i was looking at your code I faced with a problem for posterior sharpening. In file "base_bayesian_module.py" your posterior sharpening need "loss" as input. In file "lstm_bayesian_layer.py" you call posterior sharpening in "forward" function with 3 condition. Now, I was wondering how do you calculate posterior sharpening loss? I found " forward_with_sharpening" function, however, this is not the way original paper introducing. Thanks in advance.

weights std in BayesianLinear

Hey,

sorry, I could not find the answer.

How can I get the uncertainty estimate for each weight in the BayesianLinear layer? I mean the std for each weighty.

Thanks

Question about adding linear bayesian layers on top of a conv network

Thanks for this awesome work.

I'm working with GCN networks, and I would like to introduce uncertainty in the predictions. I have added linear bayesian layers on top of my convolutional network (last layers).

I'm getting good prediction results with uncertainty.

I'm not sure that Is this a good way to introduce uncertainty or do I have to use weight uncertainty in every single layer of the network?

Bayesian Embedding Implementation

Hi,

I’m a beginner trying to implement BayesianEmbedding. I have an array of embeddings of dimension 768 that I’m trying to embed into a feedforward BayesianEmbedding network and reduce the dimensionality to 1. I have a target with whom I would like to compare the output with RMSE criterion.

It would really help if you could provide me with a sample implementation like the one you’ve provided for Bayesian Linear Regression. Thank you very much.

Image inference of a BayesianCNN on the MNIST dataset

Hello @piEsposito , thank you very much for this nice pytonic implementation of Bayesian neural nets!

Excuse me for this massive posting, it contains my inference script (which is a bit lengthy).

I have used your training script example (blitz/examples/bayesian_LeNet_mnist.py) to train a Bayesian CNN on the MNIST dataset. Then, I made an inference script using the training weights:

import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.datasets as dsets
import torchvision.transforms as transforms

from blitz.modules import BayesianLinear, BayesianConv2d
from blitz.losses import kl_divergence_from_nn
from blitz.utils import variational_estimator

import matplotlib.pyplot as plt
import numpy as np
import time
np.set_printoptions(formatter={'float_kind':'{:f}'.format})

train_dataset = dsets.MNIST(root="./data",
                             train=True,
                             transform=transforms.ToTensor(),
                             download=True
                            )
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=64,
                                           shuffle=True)

test_dataset = dsets.MNIST(root="./data",
                             train=False,
                             transform=transforms.ToTensor(),
                             download=True
                            )
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                           batch_size=64,
                                           shuffle=True)


def plot_uncertain_images(uncertain_images, uncertain_vars):
    sorted_vars = uncertain_vars.copy()
    sorted_vars.sort()
    highest_vars = sorted_vars[len(sorted_vars)-20:]
    w=10
    h=10
    fig=plt.figure(figsize=(8, 8))
    columns = 4
    rows = 5
    for i in range(1, columns*rows +1):
        fig.add_subplot(rows, columns, i)
        idx = uncertain_vars.index(highest_vars[i-1])
        img = uncertain_images[idx]
        plt.imshow(img)
    plt.show()

@variational_estimator
class BayesianCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = BayesianConv2d(1, 6, (5,5))
        self.conv2 = BayesianConv2d(6, 16, (5,5))
        self.fc1   = BayesianLinear(256, 120)
        self.fc2   = BayesianLinear(120, 84)
        self.fc3   = BayesianLinear(84, 10)

    def forward(self, x):
        out = F.relu(self.conv1(x))
        out = F.max_pool2d(out, 2)
        out = F.relu(self.conv2(out))
        out = F.max_pool2d(out, 2)
        out = out.view(out.size(0), -1)
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        return out


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
classifier = BayesianCNN()
classifier.load_state_dict(torch.load("./weights/epoch-66.pt"))
classifier.to(device)
classifier.eval()

samples = 100
correct = 0
predicted = 0
uncertain_vars = []
uncertain_images = []

## do the image inference on the test-dataset
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        batch_size = images.shape[0]

        predictions = np.zeros((batch_size, samples)).astype(np.uint8)

        for i in range(samples):
            outputs = classifier(images.to(device))
            probs = F.softmax(outputs.data,1)
            preds = torch.argmax(probs,1)
            predictions[:,i] = preds.detach().cpu().numpy()

        var = np.var(predictions, axis=1)

        for j in range(batch_size):
            pred_var = var[j]
            if pred_var == 0:
                # I'm sure about the prediction, so I'm going to predict
                predicted += 1
                prediction = int(np.percentile(predictions[j,:], 50))
                correct += (prediction == labels[j].numpy()).sum().item()
            else:
                # I'm not sure about the prediction, so I'll skip the prediction
                uncertain_vars.append(pred_var)
                img = np.multiply(images[j].permute(1, 2, 0).detach().cpu().numpy(), 255).astype(np.uint8)
                uncertain_images.append(img)

print('Accuracy of the network on the {0:d} predicted test images: {1:.2f} %'.format(predicted, (100 * correct / predicted)))
plot_uncertain_images(uncertain_images, uncertain_vars)


## do the same trick, but then on 64 randomly generated images ('noise')
batch_size = 64
predicted = 0
images_random = torch.rand(batch_size,1,28,28)
labels_random = torch.randint(0,10,(batch_size,))
predictions = np.zeros((batch_size, samples)).astype(np.uint8)
certain_labels = []
certain_images = []

for i in range(samples):
    outputs = classifier(images_random.to(device))
    probs = F.softmax(outputs.data,1)
    preds = torch.argmax(probs,1)
    predictions[:,i] = preds.detach().cpu().numpy()

var = np.var(predictions, axis=1)
for j in range(batch_size):
    pred_var = var[j]
    if pred_var == 0:
        # I'm sure about the prediction, so I'm going to predict
        predicted += 1
        prediction = int(np.percentile(predictions[j,:], 50))
        img = np.multiply(images_random[j].permute(1, 2, 0).detach().cpu().numpy(), 255).astype(np.uint8)
        certain_images.append(img)
        certain_labels.append(prediction)
    else:
        # I'm not sure about the prediction, so I'll skip the prediction
        pass

print('{0:d} predictions were made on {1:d} images with random noise'.format(predicted, batch_size))
for h in range(len(certain_images)):
    plt.imshow(certain_images[h])
    plt.title('Prediction: {:d}'.format(certain_labels[h]))
    plt.show()

My main question: what is a good "decision rule" to select the MNIST-digits the BayesianCNN is less confident about?

As you can see in my script (pred_var == 0), I sampled the BayesianCNN 100 times and then rejected a digit when the variance of the 100 estimates exceeded 0 (meaning that the prediction is rejected when at least one of estimates deviates from the rest). I have also done a sanity check by simulating 64 random noise images, and then checking whether the BayesianCNN is giving uniform predictions there...

This is one of the outputs that was generated:

"Accuracy of the network on the 9054 predicted test images: 99.96 %"
"5 predictions were made on 64 images with random noise"

Logically, these 5 predictions (all predictions were of digit '8' by the way) are much better than the 64 predictions that were made with a standard LeNet trained on MNIST (without the Bayesian Layers).

Another question: the prediction of the BayesianCNN was done with the torch.nn.functional.softmax function (followed by the selection of the highest probability of the softmax). However I'm wondering if this softmax approach is correct... what would you advise? Is there a better probabilistic/Bayesian way?

Thanks in advance!

-inf in log_prior and nan in loss breaks training

Hello, first of all amazing work, and thank you for this project!
I'm trying to train simple 3-layered NN and I encountered some problems I wanted to ask about. Here is my model:

BayesianRegressor(
  (blinear1): BayesianLinear(
    (weight_sampler): GaussianVariational()
    (bias_sampler): GaussianVariational()
    (weight_prior_dist): ScaleMixturePrior()
    (bias_prior_dist): ScaleMixturePrior()
  )
  (relu): ReLU()
  (blinear2): BayesianLinear(
    (weight_sampler): GaussianVariational()
    (bias_sampler): GaussianVariational()
    (weight_prior_dist): ScaleMixturePrior()
    (bias_prior_dist): ScaleMixturePrior()
  )
  (relu2): ReLU()
  (blinear3): BayesianLinear(
    (weight_sampler): GaussianVariational()
    (bias_sampler): GaussianVariational()
    (weight_prior_dist): ScaleMixturePrior()
    (bias_prior_dist): ScaleMixturePrior()
  )
)

I'm training it on dataset with prices of flats/houses I recently scraped, and I've encountered problem I cannot seem to fully understand: after a few epochs, loss returned by the model.sample_elbo method is sometimes equal to nan, which when backpropagated breaks the whole training, as some of the weights are 'optimized' to nans:

model_copy.sample_elbo(inputs=datapoints.to(device),
                       labels=labels.to(device),
                       criterion=criterion,
                       sample_nbr=3,
                       complexity_cost_weight=1/X_train.shape[0])

I managed to track down where the incorrect values appears first, before backpropagation of these nans, and it turned out that value of log_prior in first bayesian layer is sometimes equal to -inf

first_layer = list(model_copy.modules())[0].blinear1
first_layer .log_prior # returns -inf

Going further I checked that the problem is in weight_prior_dist, which sometimes, like one in 5 times returns -inf:

w =first_layer.weight_sampler.sample() #sampled weigths
prior_dist = first_layer.weight_prior_dist 
print(prior_dist.log_prior(w)) #sometimes returns -inf

Going deeper I realised, that the problem is in prior_pdf of first prior distribution in weight_prior_dist of first layer. Some of logarithms of probabilities for the sampled values of weights (prior_dist.dist1.log_prob(w)) are very small, equal to ~-100, and when passed through torch.exp such small values are approximated to 0. When these 0-weights go through torch.log in prior_dist.log_prior(w) they are equal to -inf, and the whole mean approaches then -inf, which corrupts further calculations of loss:

prob_n1 = torch.exp(prior_dist.dist1.log_prob(w)) # minimal value of this tensor is equal to 0
if prior_dist.dist2 is not None:
    prob_n2 = torch.exp(prior_dist.dist2.log_prob(w))

prior_pdf = (prior_dist.pi * prob_n1 + (1 - prior_dist.pi) * prob_n2) # minimal value of this tensor is equal to 0
(torch.log(prior_pdf)).mean() #formula for calculating log_prior of weight_prior_dist, returns -inf

If I understand correctly, it means that the probabilities of such sampled weights for prior distribution are very very small, approaching zero, but could you suggest me the way of tackling this problem somehow, so they remain very small, and not zero? Or maybe the problem is different?
I'm still learning details of Bayesian DL, so I hope there aren't so many silly mistakes, and thank you for any kind of help!
best regards
Rafał

Elbo loss converging but accuracy doesn't improve

I'm trying to train a bayesian LSTM to predict remaining useful lifetime using windows of ten samples with roughly 600 features. I previously trained a conventional LSTM in tensorflow and therefore rebuild the architecture in pytorch to be able to use blitz.

The problem is that when I train using the elbo loss the loss converges quickly (but not to zero, which is not a problem I suppose?) but the accuracy is not doing anyting.
I also tried training using normal cross entropy loss, which works perfectly but I'm not sure if the 'bayesianity' is still valid?
Another attempt is to freeze the model for the first part of training and then unfreeze and continue training. In that case the model converges (using elbo loss) and the accuracy improves, but once I unfreeze and continue training the accuracy drops again.

Any experiences on that? Is there something I could change about the code? Is it even valid to check accuracy with bayesian networks? I think it should be because the network should still predict properly most times? (I'm completely new to bayesian nets though and might not have understood everything fully...)

Huge Memory Demand

Hi, when I try to train a simple model using Blizt I get huge memory demand > 12 GB for a very small dataset.

@variational_estimator
class BayesianLstm(Module):
def init(self, output_size=1, input_size=24, hidden_size=32, seq_length=30, hidden_neurons=8, batch_size=512):
super(BayesianLstm, self).init()

      self.input_size = input_size
      self.hidden_size = hidden_size
      self.seq_length = seq_length
      self.hidden_layer_size = hidden_size

      # First lstm cell
      self.lstm1 = BayesianLSTM(input_size, hidden_size)
      # second lstm cell
      self.lstm2 = BayesianLSTM(hidden_size, hidden_size*2)
      # first fully connected layer
      self.fc1 = BayesianLinear(hidden_size * 2, hidden_neurons)
      self.act1 = nn.ReLU()
      # self.bat1 = nn.BatchNorm1d(num_features=hidden_neurons)
      self.drop = nn.Dropout(inplace=True, p=0.5)

      # second fully connected layer
      self.fc2 = BayesianLinear(hidden_neurons, hidden_neurons)
      self.act2 = nn.ReLU()
      # self.bat2 = nn.BatchNorm1d(num_features=hidden_neurons)

      # output
      self.output = BayesianLinear(hidden_neurons, output_size)

My data is of shape [batchsize, sequence_length, number_features]
I tried this for batchsize = 512, sequence_length= 30, number_features=24.

ScaleMixturePrior::log_prior()

In the code below the log_prior is calculated by taking the mean of the log probabilities, however the paper describes taking the product of these weights.

def log_prior(self, w):
"""
Calculates the log_likelihood for each of the weights sampled relative to a prior distribution as a part of the complexity cost
returns:
torch.tensor with shape []
"""
prob_n1 = torch.exp(self.normal1.log_prob(w))
prob_n2 = torch.exp(self.normal2.log_prob(w))
prior_pdf = (self.pi * prob_n1 + (1 - self.pi) * prob_n2)
return (torch.log(prior_pdf)).mean()

Would it perhaps be preferable to use something like:

reduce(lambda a, b: a * b, log(prior_pdf))

Occur binary classification negative loss.

Hello. I'm experimenting with your package in binary classification.
But I checked the loss function and found that there was a negative float.
I checked your code and it's log prior, can you say it's learning well?

in your code

loss = criterion(outputs, labels) +self.nn_kl_divergence() * complexity_cost_weight
<-> criterion(outputs, labels) +(module.log_variational_posterior - module.log_prior
) * complexity_cost_weight

in my code

criterion = torch.nn.BCELoss()
loss = classifier.sample_elbo(inputs=datapoints.to(device),
                                      labels=labels.to(device).float(),
                                      criterion=criterion,
                                      sample_nbr=3,
                                      complexity_cost_weight= 0.5
                                     )
loss.backward()

image

Thanks :)

Variance of predictions is to small

Hi,

I am trying to use Blitz for a regression problem. Unfortunately, my nets don´t seem to learn the variance of the data correctly. The variance in the predictions is so low, that the true value is never within a reasonable confidence interval around the mean of the predictions. However, the means of the predictions are acceptable and clearly show that the nets are learning something. For better understanding i attached a plot.
What I have tried so far:
• Different architectures with and without convolutional layers
• Changing prior_sigma_1, prior_sigma_2 and prior_pi
• Changing complexity_cost_weight in the sample_elbo method

prediction density plot

Best regards
Lukas

A suggestion to Blitz

Firstly, thanks for your implementation for bayes by backprop in Blitz. This is a very nice tool and helped us a lot. While, I have a minor suggestion and I hope you can consider it. It will be very helpful to not only return a total loss when training the Bayesian layer, but return two separate loss: log likelihood and KL divergence. This could be more beneficial to see how the trade-off between two loss is achieved, and benefit our training process.

Boston regression example

Hi,

I tired the Boston regression example, but the result is somewhat strange for me:

CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 31968.1094
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 31707.7559
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 31548.9590
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 31237.2090
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 30955.3594
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 30741.1934
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 30444.9160
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 30193.9434
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 29945.0215
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 29647.4902
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 29413.7676
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 29227.4375
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 28922.7090
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 28641.3125
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 28439.4082
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 28159.0293
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 27873.6797
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 27612.1309
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 27459.8281
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 27111.7598
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 26903.3359
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 26611.5840
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 26382.6406
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 26132.0469

Process finished with exit code 0

The accuracy is not improving. Could you please elaborate a little more what is happening?

Thanks!
-Daniel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.