Code Monkey home page Code Monkey logo

pytorch-bayesiancnn's Introduction

Python 3.7+ Pytorch 1.3 License: MIT arxiv

We introduce Bayesian convolutional neural networks with variational inference, a variant of convolutional neural networks (CNNs), in which the intractable posterior probability distributions over weights are inferred by Bayes by Backprop. We demonstrate how our proposed variational inference method achieves performances equivalent to frequentist inference in identical architectures on several datasets (MNIST, CIFAR10, CIFAR100) as described in the paper.


Filter weight distributions in a Bayesian Vs Frequentist approach

Distribution over weights in a CNN's filter.


Fully Bayesian perspective of an entire CNN

Distributions must be over weights in convolutional layers and weights in fully-connected layers.


Layer types

This repository contains two types of bayesian lauer implementation:

  • BBB (Bayes by Backprop):
    Based on this paper. This layer samples all the weights individually and then combines them with the inputs to compute a sample from the activations.

  • BBB_LRT (Bayes by Backprop w/ Local Reparametrization Trick):
    This layer combines Bayes by Backprop with local reparametrization trick from this paper. This trick makes it possible to directly sample from the distribution over activations.


Make your custom Bayesian Network?

To make a custom Bayesian Network, inherit layers.misc.ModuleWrapper instead of torch.nn.Module and use BBBLinear and BBBConv2d from any of the given layers (BBB or BBB_LRT) instead of torch.nn.Linear and torch.nn.Conv2d. Moreover, no need to define forward method. It'll automatically be taken care of by ModuleWrapper.

For example:

class Net(nn.Module):

  def __init__(self):
    super().__init__()
    self.conv = nn.Conv2d(3, 16, 5, strides=2)
    self.bn = nn.BatchNorm2d(16)
    self.relu = nn.ReLU()
    self.fc = nn.Linear(800, 10)

  def forward(self, x):
    x = self.conv(x)
    x = self.bn(x)
    x = self.relu(x)
    x = x.view(-1, 800)
    x = self.fc(x)
    return x

Above Network can be converted to Bayesian as follows:

class Net(ModuleWrapper):

  def __init__(self):
    super().__init__()
    self.conv = BBBConv2d(3, 16, 5, strides=2)
    self.bn = nn.BatchNorm2d(16)
    self.relu = nn.ReLU()
    self.flatten = FlattenLayer(800)
    self.fc = BBBLinear(800, 10)

Notes:

  1. Add FlattenLayer before first BBBLinear block.
  2. forward method of the model will return a tuple as (logits, kl).
  3. priors can be passed as an argument to the layers. Default value is:
priors={
    'prior_mu': 0,
    'prior_sigma': 0.1,
    'posterior_mu_initial': (0, 0.1),  # (mean, std) normal_
    'posterior_rho_initial': (-3, 0.1),  # (mean, std) normal_
}

How to perform standard experiments?

Currently, following datasets and models are supported.

  • Datasets: MNIST, CIFAR10, CIFAR100
  • Models: AlexNet, LeNet, 3Conv3FC

Bayesian

python main_bayesian.py

  • set hyperparameters in config_bayesian.py

Frequentist

python main_frequentist.py

  • set hyperparameters in config_frequentist.py

Directory Structure:

layers/: Contains ModuleWrapper, FlattenLayer, BBBLinear and BBBConv2d.
models/BayesianModels/: Contains standard Bayesian models (BBBLeNet, BBBAlexNet, BBB3Conv3FC).
models/NonBayesianModels/: Contains standard Non-Bayesian models (LeNet, AlexNet).
checkpoints/: Checkpoint directory: Models will be saved here.
tests/: Basic unittest cases for layers and models.
main_bayesian.py: Train and Evaluate Bayesian models.
config_bayesian.py: Hyperparameters for main_bayesian file.
main_frequentist.py: Train and Evaluate non-Bayesian (Frequentist) models.
config_frequentist.py: Hyperparameters for main_frequentist file.


Uncertainty Estimation:

There are two types of uncertainties: Aleatoric and Epistemic.
Aleatoric uncertainty is a measure for the variation of data and Epistemic uncertainty is caused by the model.
Here, two methods are provided in uncertainty_estimation.py, those are 'softmax' & 'normalized' and are respectively based on equation 4 from this paper and equation 15 from this paper.
Also, uncertainty_estimation.py can be used to compare uncertainties by a Bayesian Neural Network on MNIST and notMNIST dataset. You can provide arguments like:

  1. net_type: lenet, alexnet or 3conv3fc. Default is lenet.
  2. weights_path: Weights for the given net_type. Default is 'checkpoints/MNIST/bayesian/model_lenet.pt'.
  3. not_mnist_dir: Directory of notMNIST dataset. Default is 'data\'.
  4. num_batches: Number of batches for which uncertainties need to be calculated.

Notes:

  1. You need to download the notMNIST dataset from here.
  2. Parameters layer_type and activation_type used in uncertainty_etimation.py needs to be set from config_bayesian.py in order to match with provided weights.

If you are using this work, please cite:

@article{shridhar2019comprehensive,
  title={A comprehensive guide to bayesian convolutional neural network with variational inference},
  author={Shridhar, Kumar and Laumann, Felix and Liwicki, Marcus},
  journal={arXiv preprint arXiv:1901.02731},
  year={2019}
}
@article{shridhar2018uncertainty,
  title={Uncertainty estimations by softplus normalization in bayesian convolutional neural networks with variational inference},
  author={Shridhar, Kumar and Laumann, Felix and Liwicki, Marcus},
  journal={arXiv preprint arXiv:1806.05978},
  year={2018}
}
}

pytorch-bayesiancnn's People

Contributors

kumar-shridhar avatar lan-qing avatar mttgdd avatar piyush-555 avatar purvanshi avatar shigengtian avatar tuero avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-bayesiancnn's Issues

How to calculate the correct in ImageRecognition/main_Bayes.py

@kumar-shridhar Hi, when testing the alexNet model, I have the error about the size miss matching between correct and preds. How to calculate the correct? In your code, correct += preds. Should it be correct += preds.size(0)? When I use correct += preds.size(0), the test accuracy is 100%. How to calculate the correct?

def test(epoch):
    global best_acc
    net.eval()
    test_loss = 0
    correct = 0
    total = 0
    conf=[]
    m = math.ceil(len(testset) / cf.batch_size)
    for batch_idx, (inputs_value, targets) in enumerate(testloader):
        x = inputs_value.view(-1, inputs, resize, resize).repeat(cf.num_samples, 1, 1, 1)
        y = targets.repeat(cf.num_samples)
        if use_cuda:
            x, y = x.cuda(), y.cuda()
        with torch.no_grad():
            x, y = Variable(x), Variable(y)
            outputs, kl = net.probforward(x)
     ...

        loss = vi(outputs,y,kl,beta)

        test_loss += loss.data.item()
        _, predicted = torch.max(outputs.data, 1)
        preds = F.softmax(outputs, dim=1)
        results = torch.topk(preds.cpu().data, k=1, dim=1)
        #print(results[0][0].item())
        conf.append(results[0][0].item())
        total += targets.size(0)
        correct += preds
        print('total: ', total)
        print('correct:', correct)
        predicted.eq(y.data).cpu().sum()

Two convolutions doubts

Hi, i have been reading several paper of you. I am still confused on the part you do two convolutions. are 1. the mean and 2. the alpha or the variation optimized in one backpropagation or it's one backpropagation to update the mean and then next backpropagation to update the variation separately.

There is a problem with applying BayesianCNN to image restoration

Hello, I am trying to apply the latest program in the direction of image restoration. The current results do not see the noise problem. However, during the test, I found that the image I wanted was not displayed well. The color of the image has changed and the image details and features have disappeared. Because I simply nested the BBBConv2d related program, the activation function used is relu. I don't know if it is wrong for me. If I change BBBConv2d to nn.Conv2d, my program can get the desired result.
Here are the results I got:
guogong1
Original image๏ผš
1gugong

Image noise problem

when I use the Bayesian convolution neural network to process image problems, the output image has noise. Do you know why this is?

Loss function

image
I see that you have multiple nllLoss with train_size.
This makes the value of the loss is very large.
I don't understand why do you do that?

You can explain for me?.

Ouput Variance

I am trying to replace nLL with the Heteroscedastic uncertainty model using the formula given here which expects the model to output the the predicted mean and variance.
I know that the output returns a tuple of predicted mean and KL but is there a way to get the predicted variance?

about the calculation of uncertainty

Hi, in /Image Recognition/main_Bayes.py, line 226, I think there is something wrong about your calculation of the uncertainty.

In your code, test is also being done in the way of batches just like training. Assume that there are 5 instances in a batch, and num_samples is 10, then in line 224, results[0] should be a 50 * 1 tensor which contains max probabilities of 5 * 10 samples, and results[1] should also be a 50 * 1 tensor which contains predicted labels of these 50 samples. Then in line 226, you only append results[0][0] on conf, which is just the max probability of the first sample of the first instance in this batch. At last, after iterating all the batches, you calculate the uncertainty using conf, which only contains max probabilities of the first sample of the first instance in each batch.

As far as I know, epistemic and aleatoric uncertainties are all based on the sampled results, which means that they are calculated on a probability matrix of 10 * 3 (assumed that there are 3 classes in the task) of each instance's sampled results. In other words, your should get all the probabilities after softmax in each sampling of one instance, then 10 sampled results form your p_hat matrix of just one instance. If you want to calculate the average uncertainty of all test instances, I think you should first calculate the sum of them, then calculate the average.

Am I correct? Thanks!

what is the reasoning behind ฮฑ*ฮผยฒ BCNN variance?

Hi, I've been following your work for a while. In your paper, you define the variance in BCNN filters as ฮฑ*ฮผยฒ although in your recent change you get rid of this, having a general standard deviation (the parameter being conv_qw_std). I am wondering, does that mean that you changed your idea from your original paper, or what is the reason behind the change? Or are you just trying new ideas?

Also, I was wondering, what is the reasoning behind having ฮฑ*ฮผยฒ instead of, like in your latest code, a single ฯƒ value? I've read the paper but failed to understand the logic behind it.

Keep up the good work!

regarding softplus function

Hi,
you have applied softplus function in only standard deviation, will you please explain the reason for that. and also how to apply reparameterization trick for finding the variance value in simple words. because i didn't able to understand it through your paper.

thanks in advance

The updated code seems faulty

The changes introduced by f41aa32 (updated KL) result in a size mismatch error, e.g. in the Bayesian_CNN_Detailed notebook.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<timed exec> in <module>()

PyTorch-BayesianCNN/Image Recognition/utils/BayesianModels/BayesianAlexNet.py in probforward(self, x)
     42             if hasattr(layer, 'convprobforward') and callable(layer.convprobforward):
     43                 x, _kl, = layer.convprobforward(x)
---> 44                 kl += _kl
     45 
     46             elif hasattr(layer, 'fcprobforward') and callable(layer.fcprobforward):

RuntimeError: The size of tensor a (11) must match the size of tensor b (5) at non-singleton dimension 3

Simply summing the _kl variable resolves this error. But I am unable to train a model on MNIST, while I was able to get a working MNIST classifier using the previous version of the code (not on CIFAR, though, see #8).

Error in loss.backward()

Hi, I'm trying to adapt your code to a Regression problem. I'm getting the following error.

/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:57: UserWarning: Traceback of forward call that caused the error:
File "bayesian_specnet.py", line 141, in
run()
File "bayesian_specnet.py", line 125, in run
train_loss, train_kl = train_model(net, optimizer, criterion, train_loader, num_ens=train_ens, beta_type=beta_type, epoch=epoch, num_epochs=n_epochs)
File "bayesian_specnet.py", line 41, in train_model
net_out, _kl = net(inputs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/content/probabilistic-nn-cars/layers/misc.py", line 19, in forward
x = module(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/activation.py", line 603, in forward
return F.softplus(input, self.beta, self.threshold)

Traceback (most recent call last):
File "bayesian_specnet.py", line 141, in
run()
File "bayesian_specnet.py", line 125, in run
train_loss, train_kl = train_model(net, optimizer, criterion, train_loader, num_ens=train_ens, beta_type=beta_type, epoch=epoch, num_epochs=n_epochs)
File "bayesian_specnet.py", line 53, in train_model
loss.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 166, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 16, 573]], which is output 0 of SoftplusBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Is this a version-related error? Could anyone give a suggestion?
Thanks

Could anyone please explain the new kl divergence to me? thx

The new way to compute the kl confuses me for a long time, could anyone explain to me the reason for using this equation, please? thanks.
And while training the model with main_bayes.py ,the acc was always 10% using CIFAR10 and 1% using CIFAR100, which I can say is blindly picking, could anyone help?

sigma_weight

Hi,
I have no idea of how to update the parameter sigma_weight in BBBlayers. I'm not sure how backward() works exactly in this case. Could you please explain how it works in a mathematical way? Thanks a lot.

an error about GaussianVariationalInference

Hi, I want to run Image Recognition part of your code, but in PyTorch-BayesianCNN/Image Recognition/main_Bayes.py , line 27 and line 139, there is an error:

from utils.BBBlayers import GaussianVariationalInference
vi = GaussianVariationalInference(torch.nn.CrossEntropyLoss())

This is because BBBlayers.py file doesn't contain GaussianVariationalInference, but in your old code BBBlayers_.py, line 308, there is a class GaussianVariationalInference(nn.Module). It seems that your new code needs a little fix...

explanation about the code you changed

Hi, could you please give a short explanation about the difference between old version and new version of the code? There are so many parameters in your old codes, especially in BBBlayers.py. I really don't know what does each parameter mean in old codes (e.g. qw_mean, conv_qw_mean, qw_logvar, conv_qw_std), and why should they be deleted in new version.

Thanks!

Loading a Model

Thank you for sharing this code base! I have a trained BNN and am having some trouble defining my trained model after loading up the state dictionary for the trained model. Specifically, I am getting an error when calling the getModel() function. There is some code below which reproduces the error, which seems to be an issue with creating the path to save the means and variances. If I turn off the saving feature, I get another TypeError related to the model and layer classes. Can you provide an example of loading up a trained model or some additional insight? Thanks again!

--

from future import print_function

import os
import argparse

import torch
import numpy as np
from torch.optim import Adam
from torch.nn import functional as F

import data
import utils
import metrics
import config_bayesian as cfg
from models.BayesianModels.Bayesian3Conv3FC import BBB3Conv3FC
from models.BayesianModels.BayesianAlexNet import BBBAlexNet
from models.BayesianModels.BayesianLeNet import BBBLeNet
from models.BayesianModels.BayesianFeedForward import BBBFeedForward
from main_bayesian import getModel

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

net_type = 'lenet'
trainset, testset, inputs, outputs = data.getDataset('MNIST')
net = getModel(net_type, inputs, outputs).to(device)

num_ens

Hi, I'm trying to apply you framework to a regression problem.
in your default configuration file the num_ens variable is set to 1.
I was reasoning about its influence in the training, since it is the number of samples to estimate the objective function.

Which is the optimal value for this parameter? I couldn't find any discussion about it in the paper!
Thanks for help!

testset is used in training function

In line 173 the m is computed as
m = math.ceil(len(testset) / cf.batch_size)
However, it is in the training function, should it be len(trainset)? Because usually the test set should not be seen in the training phase.

Can I drop the KL term in loss function.

Hi, I want to know whether I can drop the KL term, i.e., set beta=0 in the loss function. Will it hurt the training process? Does It still belong to Bayes by Backprop? Thx and looking forward to your reply ๐Ÿ˜„.

about resume or testonly

Hi, I save the best model of BCNN in training and then resume from the checkpoint, I found that the accuracy of training has dropped from 90%+ to 80%+. I also complete the testOnly option, but I found that test accuracy of the saved best model is so much poorer then the best test accuracy when doing training and test in turns. I guess that since the weight is a distribution, the performance of best saved model can still fluctuate in a certain range. But if so, I can hardly save the best parameters and directly use it in the next time. So what do you think of this? Thanks!

[Question] Regression

How should I modify the code to apply it to regression instead of classification? Is it enough to put one output neuron and to change CrossEntropyLoss to MSELoss?

BBBLayers

Hi,

BBBLayers_ file has an error at the init methods. I have modified it for my use, but I think it's worth noting (p or q)_logvar_init are non-default variables defined after default variables. (which gives Syntax error.)

Not a big problem anyway - just a small bug :) Might be wrong on my part.

[Question]Regression

I haven't been to replicate regression. I am trying to replicate the results from this post But it seems you have used criterion = metrics.ELBO() and not CrossEntropyLoss()

Am I missing something? Could you post a simple regression example?

Originally posted by @kumar-shridhar in #9 (comment)

Bugs in main_Bayes.py

Hello,

I tried to run a Bayesian LeNet5 on MNIST. However, I have difficulties in running the main_Bayes.py. It seems to me there are multiple bugs in the files (Flatten layer is not defined, arguments without defaults are defined after the ones that have defaults, a string is called as functions, etc). Could you let me know how I can call your code properly?

Best

what's the difference between your BCNN and BCNN with dropout in testing?

Hi, I've been following your work for a few months. To put it simply, your model randomly samples weight w to make traditional CNN into a Bayesian way, where w is drawn from a Gaussian distribution (e.g. line 148 in BBBlayers.py: weight = self.conv_qw_mean + sigma_weight * self.eps_weight.normal_()). But in another BCNN model (Deep Bayesian Active Learning, https://arxiv.org/pdf/1703.02910.pdf), the author thinks that Bayesian characteristic reflects in using dropout when testing the model (see the issue that I discussed with damienlancry: Riashat/Deep-Bayesian-Active-Learning#5) and other modules remain unchanged (e.g., the layers and the weights).

I'm really confused that what on earth does Bayes mean in a neural network like CNN. I'm very looking forward to your reply!

Python package restructuring

It would be great if this was an actual Python package that can be installed with pip install -e .
It would require some restructuring and a setup.py, but no code changes.
I don't know if it's something you wanted to do, but it would be beneficial for people using the library.

Discuss for kl_ in loss function

I see you compute kl_ in your code like that:
kl_ = math.log(self.q_logvar_init) - self.sigma_weight + (sig_weight2 + self.mu_weight2) / (2 * self.q_logvar_init ** 2) - 0.5.
I can't map the above formulate with anything what in your paper.
Can you show me why you use this formulate?

Please help me! :(.
Thanks

Bayesian Model output size changes with input image size

Congratulations on this wonderful achievement of the paper.
I am trying to implement the Bayesian CNN for a project of mine, a simple binary classification. Here is the code to define the model:

import torch.nn as nn
import math
from layers import BBB_Linear, BBB_Conv2d
from layers import BBB_LRT_Linear, BBB_LRT_Conv2d
from layers import FlattenLayer, ModuleWrapper

class get_model(ModuleWrapper):
    def __init__(self, outputs, inputs, priors=None, layer_type='bbb', activation_type='softplus'):
        super(get_model, self).__init__()

        self.num_classes = outputs
        self.layer_type = layer_type
        self.priors = priors

        if layer_type=='lrt':
            BBBLinear = BBB_LRT_Linear
            BBBConv2d = BBB_LRT_Conv2d
        elif layer_type=='bbb':
            BBBLinear = BBB_Linear
            BBBConv2d = BBB_Conv2d
        else:
            raise ValueError("Undefined layer_type")
        
        if activation_type=='softplus':
            self.act = nn.Softplus
        elif activation_type=='relu':
            self.act = nn.ReLU
        else:
            raise ValueError("Only softplus or relu supported")

        self.conv1 = BBBConv2d(inputs, 64, 11, stride=4, padding=5, bias=True, priors=self.priors)
        self.act1 = self.act()
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv2 = BBBConv2d(64, 192, 5, padding=2, bias=True, priors=self.priors)
        self.act2 = self.act()
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv3 = BBBConv2d(192, 384, 3, padding=1, bias=True, priors=self.priors)
        self.act3 = self.act()

        self.conv4 = BBBConv2d(384, 256, 3, padding=1, bias=True, priors=self.priors)
        self.act4 = self.act()

        self.conv5 = BBBConv2d(256, 128, 3, padding=1, bias=True, priors=self.priors)
        self.act5 = self.act()
        self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.flatten = FlattenLayer(1 * 1 * 128)

        self.fc1 = BBBLinear(128, 64, bias=True, priors=self.priors)
        self.act6 = self.act()
        
        self.fc2 = BBBLinear(64, 32, bias=True, priors=self.priors)
        self.act7 = self.act()
        
        self.fc3 = BBBLinear(32, outputs, bias=True, priors=self.priors)

model = get_model(outputs=2, inputs=1).to(device)

However, when i input a random input like,

model(torch.randn(1,1,64,64).to(device))[0].size()

the ouput has a shape (4,2) whereas it should be (1,2). The peculiarity seems like the model outputs (1,2) only when it gets an input of shape (1,32,32), like the MNIST dataset. However I could not find where this fixation of 32x32 image size is done.

Looking forward to your support!
Thanks

Old version of KL Divergence

Hi @kumar-shridhar @Piyush-555 ,

I am currently working on a project utilizing BayesByBackprop for image reconstruction with autoencoders. It does work good. However, I have a question regarding the old version of the calculation of the KL Divergence:

def kl_loss(self): return self.weight.nelement() / self.log_alpha.nelement() * calculate_kl(self.log_alpha)

def calculate_kl(log_alpha): return 0.5 * torch.sum(torch.log1p(torch.exp(-log_alpha)))

I do not understand, how this is derived. Is the mean of prior and posterior of the weights assumed to be zero? I do not have another explanation.

I hope you can explain that to me. Thank you very much in advance!

I have no idea why the Superresolution is Baysian,i think it is normal Conv2d in model.py

I have no idea why the Superresolution is Baysian,i think it is normal Conv2d in model.py
self.relu = nn.ReLU()
self.conv1 = nn.Conv2d(1, 64, (5, 5), (1, 1), (2, 2))
self.conv2 = nn.Conv2d(64, 64, (3, 3), (1, 1), (1, 1))
self.conv3 = nn.Conv2d(64, 32, (3, 3), (1, 1), (1, 1))
self.conv4 = nn.Conv2d(32, upscale_factor ** 2, (3, 3), (1, 1), (1, 1))
self.pixel_shuffle = nn.PixelShuffle(upscale_factor)

discrepency in you AlexNet model

Hello.
I noticed you didnt include the kl term for the CNN layers in your loss. was this intentional since the previous AlexNet model, still uses that term!

about softplus

Hi, how do you implement softplus in your code?
Thanks!

about local reparameterization trick

Hi, in your new code, there is no implementation about the local reparameterization trick. If I want to use it to compute the output of the convolution layer, I should replace sigma^2 with alpha * mu^2 (just like the implementation in your old code). But should I also replace sigma^2 with alpha * mu^2 when computing the KL divergence? i.e. should KL also be affected by the parameter alpha? Why or why not? Thanks!

about uncertainties

Hi, are there any possibilities to compute aleatoric uncertainty and epistemic uncertainty of each class of the model (BCNN), other than just one sample?

How to use this approach in case of resnet architecture

This is not an issue but not sure where should I ask this question hence thought of posting it here.
From the models I can see that it is very linear and do not need forward function as it has common wrapper. I tried to follow this approach for resnet architecture but I am kind of end up getting always 14% accuracy.
Will it be possible for you to provide sample example for the same (just the model)?

Replicate paper results

Hi,

thanks for this nice work, I really appreciate it!
I tried to replicate the results from your paper with the repository, but I have not succeeded.

First, I downloaded your repo and the datasets. Then I adapted the configuration for the Bayesian Networks:

############### Configuration file for Bayesian ###############
n_epochs = 100
lr_start = 0.001
num_workers = 4
valid_size = 0.2
batch_size = 256
train_ens = 10
valid_ens = 10

Finally I run the evaluation script with main_bayesian.py --net_type alexnet --dataset CIFAR10, but the network is not able to overcome a validation accuracy of around 58%:

Epoch: 20 	Training Loss: 2502238.8933 	Training Accuracy: 0.5926 	Validation Loss: 23935792.3000 	Validation Accuracy: 0.5635 	train_kl_div: 2445767.9315
Validation loss decreased (25103334.600000 --> 23935792.300000).  Saving model ...
Epoch: 21 	Training Loss: 2387042.4522 	Training Accuracy: 0.5968 	Validation Loss: 22808838.9500 	Validation Accuracy: 0.5584 	train_kl_div: 2330986.6338
Validation loss decreased (23935792.300000 --> 22808838.950000).  Saving model ...
Epoch: 22 	Training Loss: 2274617.4682 	Training Accuracy: 0.6079 	Validation Loss: 21713194.5000 	Validation Accuracy: 0.5725 	train_kl_div: 2219840.2803
Validation loss decreased (22808838.950000 --> 21713194.500000).  Saving model ...
Epoch: 23 	Training Loss: 2166076.8439 	Training Accuracy: 0.6137 	Validation Loss: 20656232.4500 	Validation Accuracy: 0.5872 	train_kl_div: 2112406.7866
Validation loss decreased (21713194.500000 --> 20656232.450000).  Saving model ...
Epoch: 24 	Training Loss: 2061628.5510 	Training Accuracy: 0.6183 	Validation Loss: 19644658.9000 	Validation Accuracy: 0.5701 	train_kl_div: 2008751.9745
Validation loss decreased (20656232.450000 --> 19644658.900000).  Saving model ...
Epoch: 25 	Training Loss: 1961447.8232 	Training Accuracy: 0.6230 	Validation Loss: 18666660.6500 	Validation Accuracy: 0.5711 	train_kl_div: 1908918.2803
Validation loss decreased (19644658.900000 --> 18666660.650000).  Saving model ...
Epoch: 26 	Training Loss: 1864639.9626 	Training Accuracy: 0.6289 	Validation Loss: 17726859.6500 	Validation Accuracy: 0.5758 	train_kl_div: 1812952.8240
Validation loss decreased (18666660.650000 --> 17726859.650000).  Saving model ...
Epoch: 27 	Training Loss: 1771119.3846 	Training Accuracy: 0.6386 	Validation Loss: 16825135.8500 	Validation Accuracy: 0.5862 	train_kl_div: 1720880.1863
Validation loss decreased (17726859.650000 --> 16825135.850000).  Saving model ...
Epoch: 28 	Training Loss: 1682560.0645 	Training Accuracy: 0.6406 	Validation Loss: 15963687.3750 	Validation Accuracy: 0.5892 	train_kl_div: 1632709.2596
Validation loss decreased (16825135.850000 --> 15963687.375000).  Saving model ...
Epoch: 29 	Training Loss: 1597318.2373 	Training Accuracy: 0.6459 	Validation Loss: 15150667.4000 	Validation Accuracy: 0.5615 	train_kl_div: 1548427.4435
Validation loss decreased (15963687.375000 --> 15150667.400000).  Saving model ...
Epoch: 30 	Training Loss: 1516623.9817 	Training Accuracy: 0.6498 	Validation Loss: 14361168.3500 	Validation Accuracy: 0.5829 	train_kl_div: 1467998.1879
Validation loss decreased (15150667.400000 --> 14361168.350000).  Saving model ...
Epoch: 31 	Training Loss: 1439714.2970 	Training Accuracy: 0.6520 	Validation Loss: 13613963.9500 	Validation Accuracy: 0.5829 	train_kl_div: 1391386.5470
Validation loss decreased (14361168.350000 --> 13613963.950000).  Saving model ...
Epoch: 32 	Training Loss: 1366105.2030 	Training Accuracy: 0.6600 	Validation Loss: 12909336.8000 	Validation Accuracy: 0.5755 	train_kl_div: 1318524.9443
Validation loss decreased (13613963.950000 --> 12909336.800000).  Saving model ...
Epoch: 33 	Training Loss: 1296600.1863 	Training Accuracy: 0.6617 	Validation Loss: 12236651.6000 	Validation Accuracy: 0.5815 	train_kl_div: 1249338.7006
Validation loss decreased (12909336.800000 --> 12236651.600000).  Saving model ...
Epoch: 34 	Training Loss: 1230397.9889 	Training Accuracy: 0.6638 	Validation Loss: 11600143.9500 	Validation Accuracy: 0.5893 	train_kl_div: 1183742.5000
Validation loss decreased (12236651.600000 --> 11600143.950000).  Saving model ...
Epoch: 35 	Training Loss: 1168005.8073 	Training Accuracy: 0.6705 	Validation Loss: 11004782.7250 	Validation Accuracy: 0.5683 	train_kl_div: 1121634.4037
Validation loss decreased (11600143.950000 --> 11004782.725000).  Saving model ...
Epoch: 36 	Training Loss: 1109223.4610 	Training Accuracy: 0.6687 	Validation Loss: 10435876.3750 	Validation Accuracy: 0.5749 	train_kl_div: 1062898.7377
Validation loss decreased (11004782.725000 --> 10435876.375000).  Saving model ...
Epoch: 37 	Training Loss: 1053834.6206 	Training Accuracy: 0.6691 	Validation Loss: 9895180.6000 	Validation Accuracy: 0.5803 	train_kl_div: 1007417.1760
Validation loss decreased (10435876.375000 --> 9895180.600000).  Saving model ...
Epoch: 38 	Training Loss: 1001452.7830 	Training Accuracy: 0.6708 	Validation Loss: 9391186.8750 	Validation Accuracy: 0.5642 	train_kl_div: 955054.9248
Validation loss decreased (9895180.600000 --> 9391186.875000).  Saving model ...
Epoch: 39 	Training Loss: 951858.5939 	Training Accuracy: 0.6717 	Validation Loss: 8913133.8750 	Validation Accuracy: 0.5767 	train_kl_div: 905697.4590
Validation loss decreased (9391186.875000 --> 8913133.875000).  Saving model ...
Epoch: 40 	Training Loss: 905384.9124 	Training Accuracy: 0.6734 	Validation Loss: 8459427.5000 	Validation Accuracy: 0.5760 	train_kl_div: 859194.2727
Validation loss decreased (8913133.875000 --> 8459427.500000).  Saving model ...
Epoch: 41 	Training Loss: 861708.2651 	Training Accuracy: 0.6720 	Validation Loss: 8040532.7500 	Validation Accuracy: 0.5767 	train_kl_div: 815417.2926
Validation loss decreased (8459427.500000 --> 8040532.750000).  Saving model ...
Epoch: 42 	Training Loss: 820970.3232 	Training Accuracy: 0.6684 	Validation Loss: 7639982.5250 	Validation Accuracy: 0.5765 	train_kl_div: 774222.8085
Validation loss decreased (8040532.750000 --> 7639982.525000).  Saving model ...
Epoch: 43 	Training Loss: 782426.8826 	Training Accuracy: 0.6698 	Validation Loss: 7267536.7375 	Validation Accuracy: 0.5746 	train_kl_div: 735472.4554

Can you explain, how to replicate the results from the paper?

KL divergence

Hi,
in BBBlayers.py you compute KL[p || q], but according to the ELBO formula you should compute KL[q || p]. Why is it like that? Thanks!

Possible bug in BBBlayers

It seems to me that there is a bug in BBBlayers - both in BBBConv2d and in BBBLinearFactorial.
You declare the class fields self.conv_qw and self.fc_qw, which are used for computing the KL divergence. The problem is, these parameters are not used with the actual data - in the probforward methods you are using self.qw_mean instead. Thus it looks as if you are actually not imposing the prior on the model weights (which are used for the tasks) but on a different set which is never used.
@kumar-shridhar

about q_logvar_init and p_logvar_init

Hi, in your new code, q_logvar_init and p_logvar_init are parameters in BBBconv2d (in your old code, they have fixed values). What are their initial values? I didn't find them in your code. Thanks!

BayesianCNN for SuperResolution

Hi, it seems to me there is no Bayesian related code in your SupreResolution implementation, just a regular CNN. What do I miss here? Could you please give me a hint? Many thanks!

CPU/GPU Spec you've used for running main_bayesian.py

Hi, I'm studying Bayesian CNN theseday,

and I've tried to run main_bayesian.py with MacBook Pro ('16 2019 16GB Ram)

n_epochs and batch_size (the hyper-parameter you've set) are 200 and 256 each,

but I'm afraid my MacBook cannot withstand the load for training BCNN.

So, I changed the number of epochs and batches to 10 and 100, respectively.

image

As the above photo, Validation Accuracy exceeds 0.97 (I think it's not bad)

I'm curious about your H/W Spec when designing and testing this model you've uploaded.

No " Applying two sequential convolutional operations"

Hi,
I was just looking at your paper and code.
And I don't see any sign of applying the convolutional operation two times, only once here:

out = F.conv2d(input, weight, bias, self.stride, self.padding, self.dilation, self.groups)

It seems like you did that in earlier versions of your code as in this commented out bit in BBBLayers_.py :

        # conv_qw_mean = F.conv2d(input=input, weight=weight, stride=self.stride, padding=self.padding,
        #                              dilation=self.dilation, groups=self.groups)
        # conv_qw_std = torch.sqrt(1e-8 + F.conv2d(input=input.pow(2), weight=torch.exp(self.log_alpha)*weight.pow(2),
        #                                          stride=self.stride, padding=self.padding, dilation=self.dilation, groups=self.groups)) 

Why is that?

MNIST training data from yann lecun is not available (503 Service Unavailable)

image

When I tried to run uncertainty_estimation.py, it had kept returned 503 Service Unavailable error.

Since the link for downloading mnist/train-labels-idx1-ubyte.gz is unavailable now, would you change the code not to access here ? (Even I downloaded notMNIST_small.tar and put it into /data, the codes tries to search the link so returns 503 error.

image

nan in loss

Running both main_Bayes.py and Bayesian_CNN_Detailed.ipynb in the 'Image Recognition' folder give losses that result in nan, due to the kl returned from net.probforward(x).

the version of PyTorch

Hi, I wonder which version of PyTorch you use. In the latest version (PyTorch 1.0), some settings occur errors. For example, AttributeError: module 'torch._C' has no attribute '_cuda_setDevice' (line: torch.cuda.set_device(0))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.