piesposito / blitz-bayesian-deep-learning Goto Github PK
View Code? Open in Web Editor NEWA simple and extensible library to create Bayesian Neural Network layers on PyTorch.
License: GNU General Public License v3.0
A simple and extensible library to create Bayesian Neural Network layers on PyTorch.
License: GNU General Public License v3.0
I was taking a look at the examples you provided for training a BBP
model and see that you're training using a batch size of 16 in the example below:
In the sample_elbo
function you include the ability to implement a complexity_cost_weight
but at the moment this does not seem to be utilised. In section 3.4 of the Weight Uncertainty in Neural Networks paper, the authors suggest a method to calculate the minibatch-weight:
This shouldn't be too hard to implement and would help ensure that the loss is calculated correctly.
Hi! I'm trying to create a Siamese Network using Bayesian Layers. But I'm having the following issue:
/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:57: UserWarning: Traceback of forward call that caused the error:
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/traitlets/config/application.py", line 664, in launch_instance
app.start()
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 583, in start
self.io_loop.start()
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 149, in start
self.asyncio_loop.run_forever()
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/asyncio/base_events.py", line 442, in run_forever
self._run_once()
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/asyncio/base_events.py", line 1462, in _run_once
handle._run()
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/ioloop.py", line 690, in <lambda>
lambda f: self._run_callback(functools.partial(callback, future))
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/ioloop.py", line 743, in _run_callback
ret = callback()
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/gen.py", line 787, in inner
self.run()
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/gen.py", line 748, in run
yielded = self.gen.send(value)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 361, in process_one
yield gen.maybe_future(dispatch(*args))
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
yielded = next(result)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell
yield gen.maybe_future(handler(stream, idents, msg))
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
yielded = next(result)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 541, in execute_request
user_expressions, allow_stdin,
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
yielded = next(result)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 300, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2858, in run_cell
raw_cell, store_history, silent, shell_futures)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2886, in _run_cell
return runner(coro)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
coro.send(None)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3063, in run_cell_async
interactivity=interactivity, compiler=compiler, result=result)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3254, in run_ast_nodes
if (await self.run_code(code, result, async_=asy)):
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-78-4c5bd945864c>", line 12, in <module>
pred = sn(data1.to(torch.int64), data2.to(torch.int64))
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "<ipython-input-72-7e84bf9dfb73>", line 24, in forward
input_1 = self.e3(input_1)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/blitz/modules/linear_bayesian_layer.py", line 72, in forward
b = self.bias_sampler.sample()
File "/home/anatamais-t480/anaconda3/envs/torch-ds/lib/python3.6/site-packages/blitz/modules/weight_sampler.py", line 33, in sample
self.w = self.mu + self.sigma * self.eps_w
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-78-4c5bd945864c> in <module>
17
18 # Backpropagation
---> 19 loss.backward()
20
21 optimizer.step()
~/anaconda3/envs/torch-ds/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
164 products. Defaults to ``False``.
165 """
--> 166 torch.autograd.backward(self, gradient, retain_graph, create_graph)
167
168 def register_hook(self, hook):
~/anaconda3/envs/torch-ds/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
97 Variable._execution_engine.run_backward(
98 tensors, grad_tensors, retain_graph, create_graph,
---> 99 allow_unreachable=True) # allow_unreachable flag
100
101
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [8]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
That's my code:
@variational_estimator
class SiameseNet(nn.Module):
def __init__(self, maximo, vocab1, vocab2):
super(SiameseNet, self).__init__()
self.embedding1= nn.Embedding(len(vocab1.keys()),maximo)
self.gru1= BayesianLSTM(maximo, maximo)
self.e1 = BayesianLinear(maximo,16)
self.e2 = BayesianLinear(16, 8)
self.e3 = BayesianLinear(8, 8)
def euclidean_distance(self,input_1, input_2):
input_1, input_2 = input_1[:, -1, :], input_2[:, -1, :]
dist = ((input_1-input_2)**2).sum(dim=1)
return dist
def forward(self,input_1, input_2):
input_1 = self.embedding1(input_1)
input_1, hidden1= self.gru1(input_1)
input_1 = self.e1(input_1)
input_1 = self.e2(input_1)
input_1 = self.e3(input_1)
input_2 = self.embedding1(input_2)
input_2, hidden2= self.gru1(input_2)
input_2 = self.e1(input_2)
input_2 = self.e2(input_2)
input_2 = self.e3(input_2)
output = self.euclidean_distance(input_1, input_2)
return torch.sigmoid(output)
In the code below you set the loss to be equal to criterion(outputs, labels)
on each iteration of the loop - as such this I don't believe this is increasing the value of the loss for each sample?
blitz-bayesian-deep-learning/blitz/utils/variational_estimator.py
Lines 60 to 65 in c2482f2
In addition, you normalise the entire loss at the end, might it be more appropriate to normalise just the sampled losses?
Finally, in the paper, the authors subtract the data-dependant part from the KL Divergence
whereas in your code you seem to summate them, would it be correct to change the sign here - or would it depend entirely on the criterion
defined?
The changes I'm proposing are summarised in the code below:
loss = 0
for _ in range(sample_nbr):
outputs = self(inputs)
loss += self.nn_kl_divergence() * complexity_cost_weight
loss /= sample_nbr
return loss - criterion(outputs, labels)
I found a weird bug, I was able to reproduce it with the bayesian_LeNet_mnist example by reducing the number of parameters.
I have no idea why but based on my tests it occurs randomly during the training (though with the same seed it always trigger at the same iteration). And I was able to reproduce it only with small networks, if I add more neurons or more layers the problem never happen.
Traceback (most recent call last):
File "networks/test.py", line 66, in <module>
main()
File "networks/test.py", line 40, in main
loss = classifier.sample_elbo(inputs=datapoints.to(device),
File "venv/lib/python3.8/site-packages/blitz/utils/variational_estimator.py", line 65, in sample_elbo
outputs = self(inputs)
File "venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "networks/test.py", line 28, in forward
out = self.fc3(out)
File "venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "venv/lib/python3.8/site-packages/blitz/modules/linear_bayesian_layer.py", line 93, in forward
self.log_prior = self.weight_prior_dist.log_prior(w) + b_log_prior
File "venv/lib/python3.8/site-packages/blitz/modules/weight_sampler.py", line 84, in log_prior
prob_n1 = torch.exp(self.dist1.log_prob(w))
File "venv/lib/python3.8/site-packages/torch/distributions/normal.py", line 73, in log_prob
self._validate_sample(value)
File "venv/lib/python3.8/site-packages/torch/distributions/distribution.py", line 277, in _validate_sample
raise ValueError('The value argument must be within the support')
ValueError: The value argument must be within the support
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from blitz.modules import BayesianConv2d, BayesianLinear
from blitz.utils import variational_estimator
def main():
train_dataset = dsets.MNIST(root="./cache", train=True, transform=transforms.ToTensor(), download=True)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_dataset = dsets.MNIST(root="./cache", train=False, transform=transforms.ToTensor(), download=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=True)
@variational_estimator
class BayesianCNN(nn.Module):
def __init__(self):
super().__init__()
self.fc2 = BayesianLinear(784, 10)
self.fc3 = BayesianLinear(10, 10)
def forward(self, x):
out = x.view(x.size(0), -1)
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
classifier = BayesianCNN().to(device)
optimizer = optim.Adam(classifier.parameters(), lr=0.001)
criterion = torch.nn.CrossEntropyLoss()
iteration = 0
for epoch in range(100):
for i, (datapoints, labels) in enumerate(train_loader):
optimizer.zero_grad()
loss = classifier.sample_elbo(inputs=datapoints.to(device),
labels=labels.to(device),
criterion=criterion,
sample_nbr=3,
complexity_cost_weight=1 / 50000)
# print(loss)
loss.backward()
optimizer.step()
iteration += 1
if iteration % 250 == 0:
print(loss)
correct = 0
total = 0
with torch.no_grad():
for data in test_loader:
images, labels = data
outputs = classifier(images.to(device))
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels.to(device)).sum().item()
print('Iteration: {} | Accuracy of the network on the 10000 test images: {} %'
.format(str(iteration), str(100 * correct / total)))
if __name__ == '__main__':
main()
I really like the idea of this library. However, it is really hard to get started with it.
For example, the minimal working example in the read me doesn't work. No matter what I try, I cannot get it to improve the accuracy. The loss decrease towards zero, but the accuracy doesn't change. The problem is that it is totally unclear why this is the case is it because:
Any help would be appreciated.
Hi, Blitz is very useful, but I have a question.
When I reduced the size of data, the result of prediction became so bad.
Change the parameters prior_sigma_1, prior_sigma_2 and posterior_rho_init seems can improve the result of prediction.
Therefore, can you give me some advice that how to initialize these parameters?
Best regard!
Hey, I may make a PR for adding the convtranspose soon, was just wondering if it was a planned feature in the near-term, thanks!
Hi. Thank you for the easy to use library. I am using the library for my project where I try to create a encoder decoder network using lstm. The last layer of the model is a Linear layer. It works as a multi-class classification problem. I am trying to replace the lstm and the last linear layer with the Bayesian alternative. I am very new to BNNs. I don't quite understand how the uncertainty can be measured in this case so that the classifier doesn't predict anything if the uncertainty is high. Any help will be appreciated. Thank you
The sample_elbo function is constructed as follow:
‘’‘
The ELBO Loss consists of the sum of the KL Divergence of the model
(explained above, interpreted as a "complexity part" of the loss)
with the actual criterion - (loss function) of optimization of our model
(the performance part of the loss).
’‘’
。
But most others calculate ELBO = - log_Likelihood + KL.
I don’t think “the actual criterion” and “ - log_Likelihood” are equivalent。
Specifically,I want to know why the loss in the regression task includes mse as follow:
def sample_elbo(self, inputs, labels, criterion, sample_nbr, complexity_cost_weight=1):
loss = 0
for _ in range(sample_nbr):
outputs = self(inputs)
loss += criterion(outputs, labels)
loss += self.nn_kl_divergence() * complexity_cost_weight
return loss / sample_nbr
I think the log_gaussian should be used since the lose = - log_Likelihood + KL.
When the task is classification,the ”log_Likelihood“ can be calculated with “torch.nn.CrossEntropyLoss()”.
But the ”torch.nn.MSELoss()“ can't give the ”log_Likelihood“.
Hey there! Just want to say that I am really impressed by your repo. Good work!
Will there be options to expand the set of priors that may be accessible and posterior sampling methods such as that featured in https://arxiv.org/abs/1902.03932 ?
Thanks!
Hi there!
I have been building predictive posterior distributions for my predictive models after the application of a softmax label for the following classifier:
(classifier): Sequential(
(0): Dropout(p=0.5, inplace=False)
(1): BayesianLinear(
(weight_sampler): TrainableRandomDistribution()
(bias_sampler): TrainableRandomDistribution()
(weight_prior_dist): PriorWeightDistribution()
(bias_prior_dist): PriorWeightDistribution()
)
(2): ReLU(inplace=True)
(3): Dropout(p=0.5, inplace=False)
(4): BayesianLinear(
(weight_sampler): TrainableRandomDistribution()
(bias_sampler): TrainableRandomDistribution()
(weight_prior_dist): PriorWeightDistribution()
(bias_prior_dist): PriorWeightDistribution()
)
(5): ReLU(inplace=True)
(6): BayesianLinear(
(weight_sampler): TrainableRandomDistribution()
(bias_sampler): TrainableRandomDistribution()
(weight_prior_dist): PriorWeightDistribution()
(bias_prior_dist): PriorWeightDistribution()
)
)
For some reason, my posterior after the application of a softmax looks like the following:
array([[[0., 0., 0., 0., 1., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
...,
[1., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 1.]], ....
With the following dimensionality: (50, 60, 6) <- [n_posterior, n_samples, n_classes]
I'm not sure why I get all ones and 0s, can you help explain why this may be the case? After averaging across the posterior predictive distribution, I am returned something more useful:
array([[0.5 , 0. , 0.24, 0.02, 0.22, 0.02],
[0.22, 0. , 0.58, 0. , 0.18, 0.02],
[0.32, 0.02, 0.22, 0.04, 0.18, 0.22],
[0.44, 0. , 0.38, 0. , 0.12, 0.06],
[0.48, 0. , 0.4 , 0. , 0.1 , 0.02],
However, while the averaged predictive posterior can be analyzed for entropy, I'm afraid the entropy of the original predictive posterior before averaging is meaningless. Can you help me understand perhaps what can be changed to prohibit predictions of 0 and 1 and why the softmax may be yielding this?
From what I can tell from the latent distribution of data and the fact that predictive posterior label assignment fluctuates heavily, predictions seem highly uncertain and should be reflected by the original predictive posterior assignment. I think the reason for this is that when predicting using this framework, I have a pre-softmax distribution that contains multiple large numbers (see row 1):
array([[[ 3.94745656e+05, -1.36722781e+05, 2.51609000e+05,
-2.27076387e+04, 1.94971672e+05, -1.56166104e+04],
[ 5.99162695e+04, -1.85560625e+05, 2.40618594e+05,
1.35033016e+05, 6.93285859e+04, -2.45788859e+05],
[ 1.50803141e+05, -6.68518984e+04, 8.18610234e+04,
-1.84584469e+05, 1.43614078e+05, 1.04057883e+05],
I really do not think I should be getting this kind of distribution for the posterior, though perhaps could reflect my setup or maybe some instabilities with training these types of networks. Curious to hear your thoughts?
I want do something about bayesian deep learning research based on the pytorch 1.7, but i find that the modules only depends on pytorch 1.4 when I use pip to install it, will you update the modules to depend on pytorch 1.7? if you can, I will be very happy~
I found that if you use .double()
to change the type of both model and data, the printed parameters will keep unchanged (although the model seems to be updated). Are there any explanations?
Are there any plans to expand the set of Bayesian implementations for non-linear layers (e.g. sigmoid, tanh)? Or they already exist, but maybe I'm just failing to find them. I am more than glad to provide some time/effort towards the inclusion of this feature.
Cheers,
Jose
PS: Thanks for this library, I've tested 5+ different existing solutions, and Blitz is perfect for my goals of rapid prototyping and testing.
Hi (and thanks a bunch for this framework!),
I'm testing out a Bayesian neural net for a simple regression task. However, after a lot of training, when I test the output, I just get an (almost) constant output. I follow the same workflow as in the Boston Housing example, except I use a function to generate my dataset.
Here's my code, if you're interested:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.tensor as Tensor
import numpy as np
import matplotlib.pyplot as plt
from blitz.modules import BayesianLinear
from blitz.utils import variational_estimator
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
X = np.expand_dims(np.random.uniform(0,10,1000), -1) # these two lines are the only thing I changed as far
y = np.sin(X) # as data preprocessing is concerned
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=.25,
random_state=42)
X_train, y_train = torch.tensor(X_train).float(), torch.tensor(y_train).float()
X_test, y_test = torch.tensor(X_test).float(), torch.tensor(y_test).float()
ds_train = torch.utils.data.TensorDataset(X_train, y_train)
dataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)
ds_test = torch.utils.data.TensorDataset(X_test, y_test)
dataloader_test = torch.utils.data.DataLoader(ds_test, batch_size=16, shuffle=True)
@variational_estimator
class BayesianRegressor(nn.Module):
def __init__(self):
super().__init__()
#self.linear = nn.Linear(input_dim, output_dim)
self.blinear1 = BayesianLinear(1, 100)
self.blinear2 = BayesianLinear(100, 100)
self.blinear3 = BayesianLinear(100, 1)
self.sigmoid = nn.Sigmoid()
def forward(self, x_):
x = self.sigmoid(self.blinear1(x_))
x = self.sigmoid(self.blinear2(x))
x = self.blinear3(x)
return x
regressor = BayesianRegressor().to(device)
criterion = torch.nn.MSELoss()
optimizer = optim.Adam(regressor.parameters(), lr=0.005)
iteration = 0
hist = []
for epoch in range(1000):
totalloss = 0
u = 0
for i, (datapoints, labels) in enumerate(dataloader_train):
u += 1
optimizer.zero_grad()
loss = regressor.sample_elbo(inputs=datapoints.to(device),
labels=labels.to(device),
criterion=criterion,
sample_nbr=3,
complexity_cost_weight=1/X_train.shape[0])
totalloss += loss.item()
loss.backward()
optimizer.step()
hist.append(totalloss/u)
print(f"[Epoch {epoch}] "+"Loss: {:.4f}".format(totalloss/u))
Then I generate the outputs:
plt.scatter(X,y)
plt.scatter(X,regressor(Tensor(X).float().to(device)).detach().cpu(),s=5)
plt.show()
Is there something I'm doing wrong here?
So I'm not quite sure...but the requirements.txt include some specific version of torch which is not available currently. So every time i try to install it i get the error that the torch package is not avaiable. I think your programm is compatible with every version of
pytorch > 1.3.1 till the last stable one 1.51. So i will write the requirements.txt a bit different:
torch>=1.3.1
Hi,
is there a possibility to initialize the mu´s of a net that is using blitz layers with the parameters of its deterministic equivalent? To get a better understanding I build a net with only one weight and one bias, thus the weight can be interpreted as the slope and the bias as the intercept of a linear function. However, setting the mu of the bias and weight seems to only work for a frozen model.
import torch
import torch.nn as nn
from blitz.modules import BayesianLinear
from blitz.utils import variational_estimator
import matplotlib.pyplot as plt
import numpy as np
@variational_estimator
class bayesian_Net(nn.Module):
def __init__(self, freeze):
super(bayesian_Net, self).__init__()
self.bl1 = BayesianLinear(1, 1, bias=True, freeze=freeze)
def forward(self, x):
x = self.bl1(x)
return x
unfrozen_net = bayesian_Net(freeze=False)
frozen_net = bayesian_Net(freeze=True)
slope = -0.5
intercept = 1
unfrozen_net.bl1.weight_mu = torch.nn.Parameter(torch.Tensor([[slope]]), requires_grad=True)
unfrozen_net.bl1.bias_mu = torch.nn.Parameter(torch.Tensor([intercept]), requires_grad=True)
frozen_net.bl1.weight_mu = torch.nn.Parameter(torch.Tensor([[slope]]), requires_grad=True)
frozen_net.bl1.bias_mu = torch.nn.Parameter(torch.Tensor([intercept]), requires_grad=True)
x = np.zeros((10, 1))
for k in range(1, 10):
x[k] = k
x = torch.FloatTensor(x)
#plot
for k in range(100):
plt.plot(x, unfrozen_net(x).detach().numpy(), 'r')
plt.plot(x, frozen_net(x).detach().numpy(), 'k')
Hello,
first off thank you for sharing this project!
I am looking for a way to include uncertainties in my input data. Basically, I want to do a regression on experimental data, where some of the data is more accurately determined, and therefore I want it to have a larger "weight".
I looked through the examples and could not find a way to do this.
First, I want to confirm if what I want is possible.
Secondly, it would be most appreciated if anyone could point me in the right direction.
Thank you for your time reading this :)
I have been trying to use Bayesian linear regression example given by Blitz authors and parallelize their model by wrapping it with torch.nn.DataParallel. However, it seems that the given code is only using one gpu and not multiple gpus. Below is the same code from the bayesian_regression_boston.py example with model wrapped in DataParallel.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from blitz.modules import BayesianLinear
from blitz.utils import variational_estimator
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
X, y = load_boston(return_X_y=True)
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(np.expand_dims(y, -1))
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=.25,
random_state=42)
X_train, y_train = torch.tensor(X_train).float(), torch.tensor(y_train).float()
X_test, y_test = torch.tensor(X_test).float(), torch.tensor(y_test).float()
@variational_estimator
class BayesianRegressor(nn.Module):
def __init__(self, input_dim, output_dim):
super().__init__()
#self.linear = nn.Linear(input_dim, output_dim)
self.blinear1 = BayesianLinear(input_dim, 512)
self.blinear2 = BayesianLinear(512, output_dim)
def forward(self, x):
print("\tIn Model: input size", x.size())
x_ = self.blinear1(x)
x_ = F.relu(x_)
return self.blinear2(x_)
def evaluate_regression(regressor,
X,
y,
samples = 100,
std_multiplier = 2):
preds = [regressor(X) for i in range(samples)]
preds = torch.stack(preds)
means = preds.mean(axis=0)
stds = preds.std(axis=0)
ci_upper = means + (std_multiplier * stds)
ci_lower = means - (std_multiplier * stds)
ic_acc = (ci_lower <= y) * (ci_upper >= y)
ic_acc = ic_acc.float().mean()
return ic_acc, (ci_upper >= y).float().mean(), (ci_lower <= y).float().mean()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
regressor = BayesianRegressor(13, 1).to(device)
optimizer = optim.Adam(regressor.parameters(), lr=0.01)
criterion = torch.nn.MSELoss()
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
regressor = nn.DataParallel(regressor)
regressor.to(device)
ds_train = torch.utils.data.TensorDataset(X_train, y_train)
dataloader_train = torch.utils.data.DataLoader(ds_train, batch_size=16, shuffle=True)
ds_test = torch.utils.data.TensorDataset(X_test, y_test)
dataloader_test = torch.utils.data.DataLoader(ds_test, batch_size=16, shuffle=True)
iteration = 0
for epoch in range(100):
for i, (datapoints, labels) in enumerate(dataloader_train):
optimizer.zero_grad()
print("Outside: input size", datapoints.size())
loss = regressor.module.sample_elbo(inputs=datapoints.to(device),
labels=labels.to(device),
criterion=criterion,
sample_nbr=1,
complexity_cost_weight=1/X_train.shape[0])
loss.backward()
optimizer.step()
iteration += 1
if iteration%100==0:
ic_acc, under_ci_upper, over_ci_lower = evaluate_regression(regressor,
X_test.to(device),
y_test.to(device),
samples=25,
std_multiplier=3)
print("CI acc: {:.2f}, CI upper acc: {:.2f}, CI lower acc: {:.2f}".format(ic_acc, under_ci_upper, over_ci_lower))
print("Loss: {:.4f}".format(loss))
Below I provide the portion of the output. I print out the input dimension before calling the loss.module.sample_elbo and that should have the batch_size x no_variables (which is correctly printed out). However, inside the model's forward map it should print out the dimensions of each smaller batch taken by each of the GPU so it should have printed out 8 lines as 'In Model: input size torch.Size([2, 13])'. But apparently, it is only putting data on one GPU.
Could you please let me know what needs to be done for this to work on multiple GPUs? FYI: I use the methods suggested here to check whether multiple GPUs are actually being used for processing the input batch.
Let's use 8 GPUs!
Outside: input size torch.Size([16, 13])
In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
In Model: input size torch.Size([16, 13])
Outside: input size torch.Size([16, 13])
In Model: input size torch.Size([16, 13])
Hi,
ImportError: cannot import name 'GaussianVariational' from 'blitz.modules.weight_sampler' (/opt/conda/lib/python3.7/site-packages/blitz/modules/weight_sampler.py)
Hi, I have a question about using Blitz to make a prediction. Take the Boston dataste as an example, I can get the loss and accuracy for the model and I'm just wondering what can I do to make a prediction i.e I'm going to load the X_train again into the trained model to compare the prediction values to the true values.
Forgive my limit knowledge of neural networks since I had neber done anything in this field. : )
Best Regards
Do you have any examples we could use for a Bayesian UNet (3D)? I'm really interested in using blitz for some research, so this would be helpful!
Thanks for an excellent and well thought out framework.
It looks like from a model evaluation perspective - freeze/unfreeze have no role to play . To predict on a held out set it is enough to set torch.no_grad() and model.eval()
After all, all trainable parameters including those of the posterior distributions rho and mu are not affected by freeze/unfreeze (unless we call model.eval())
Is that correct?
I was taking a look at the way that the log likelihood is calculated for the variational Gaussian distribution and the second term you use self.sigma
as opposed to torch.log(self.sigma)
:
As sigma
is calculated by: sigma = log(1 + exp(epsilon))
I'm not entirely sure whether another log function needs to be applied here or not - what do you think?
Hi Piero, thank you very much for your post on bayesian LSTM. Is there any validation in this code?
Thank you.
Hi,
First of all, I very much appreciate your wonderful work!
Currently, I am testing CIFAR10 classification task with your example code.
I would like to ask whether 10% of the test performance is an usual case when I call freeze_()
in the test time.
# Test Time
with torch.no_grad():
for data in test_loader:
images, labels = data
classifier.freeze_()
outputs = classifier(images.to(device))
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels.to(device)).sum().item()
print(f'Freeze Epoch {epoch} | {str(100 * correct / total)}% | Elpased: {time.time() - tic:.1f}s')
classifier.unfreeze_()
I only add classifier.freeze_()
before getting outputs
.
I thought the accuracy should be similar with unfreeeze_()
, however, it seems not.
When I activated freeze
mode, I got 10%, but unfreeze
mode reach 45% at the first epoch.
Since I only activate it at the test time, the training loss keeps going down.
Best Regards,
YJ
Hi!
If I run model on GPU, it's cause
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_addmm
Do you have plans to add cuda support?
I use BLSTM to make time-series prediction, but the loss becomes NAN after about 40 epoches. Standard LSTM can train my dataset well. Could you know any possible reason? Thank you so much
ERROR: Could not find a version that satisfies the requirement torch==1.4.0 (from blitz-bayesian-pytorch==0.2.5) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch==1.4.0 (from blitz-bayesian-pytorch==0.2.5)
Hi,
If I copy the Bayesian2DConv class and I replace the 2d conv with 3d or 1d conv, then will it function as an appropriate baysian 3D or 1d convolutional layer? At first glance it seems to be working.
Thanks,
Daniel
Hi there, thank you for your amazing implementation. When i was looking at your code I faced with a problem for posterior sharpening. In file "base_bayesian_module.py" your posterior sharpening need "loss" as input. In file "lstm_bayesian_layer.py" you call posterior sharpening in "forward" function with 3 condition. Now, I was wondering how do you calculate posterior sharpening loss? I found " forward_with_sharpening" function, however, this is not the way original paper introducing. Thanks in advance.
Hey,
sorry, I could not find the answer.
How can I get the uncertainty estimate for each weight in the BayesianLinear layer? I mean the std for each weighty.
Thanks
Thanks for this awesome work.
I'm working with GCN networks, and I would like to introduce uncertainty in the predictions. I have added linear bayesian layers on top of my convolutional network (last layers).
I'm getting good prediction results with uncertainty.
I'm not sure that Is this a good way to introduce uncertainty or do I have to use weight uncertainty in every single layer of the network?
Hello, when calculating the variance, should we first denormalize the prediction results and then calculate the variance? In your example, it seems that the variance and mean are calculated first and then denormalized
Hi,
I’m a beginner trying to implement BayesianEmbedding. I have an array of embeddings of dimension 768 that I’m trying to embed into a feedforward BayesianEmbedding network and reduce the dimensionality to 1. I have a target with whom I would like to compare the output with RMSE criterion.
It would really help if you could provide me with a sample implementation like the one you’ve provided for Bayesian Linear Regression. Thank you very much.
Hello @piEsposito , thank you very much for this nice pytonic implementation of Bayesian neural nets!
Excuse me for this massive posting, it contains my inference script (which is a bit lengthy).
I have used your training script example (blitz/examples/bayesian_LeNet_mnist.py) to train a Bayesian CNN on the MNIST dataset. Then, I made an inference script using the training weights:
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from blitz.modules import BayesianLinear, BayesianConv2d
from blitz.losses import kl_divergence_from_nn
from blitz.utils import variational_estimator
import matplotlib.pyplot as plt
import numpy as np
import time
np.set_printoptions(formatter={'float_kind':'{:f}'.format})
train_dataset = dsets.MNIST(root="./data",
train=True,
transform=transforms.ToTensor(),
download=True
)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=64,
shuffle=True)
test_dataset = dsets.MNIST(root="./data",
train=False,
transform=transforms.ToTensor(),
download=True
)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
batch_size=64,
shuffle=True)
def plot_uncertain_images(uncertain_images, uncertain_vars):
sorted_vars = uncertain_vars.copy()
sorted_vars.sort()
highest_vars = sorted_vars[len(sorted_vars)-20:]
w=10
h=10
fig=plt.figure(figsize=(8, 8))
columns = 4
rows = 5
for i in range(1, columns*rows +1):
fig.add_subplot(rows, columns, i)
idx = uncertain_vars.index(highest_vars[i-1])
img = uncertain_images[idx]
plt.imshow(img)
plt.show()
@variational_estimator
class BayesianCNN(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = BayesianConv2d(1, 6, (5,5))
self.conv2 = BayesianConv2d(6, 16, (5,5))
self.fc1 = BayesianLinear(256, 120)
self.fc2 = BayesianLinear(120, 84)
self.fc3 = BayesianLinear(84, 10)
def forward(self, x):
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
classifier = BayesianCNN()
classifier.load_state_dict(torch.load("./weights/epoch-66.pt"))
classifier.to(device)
classifier.eval()
samples = 100
correct = 0
predicted = 0
uncertain_vars = []
uncertain_images = []
## do the image inference on the test-dataset
with torch.no_grad():
for data in test_loader:
images, labels = data
batch_size = images.shape[0]
predictions = np.zeros((batch_size, samples)).astype(np.uint8)
for i in range(samples):
outputs = classifier(images.to(device))
probs = F.softmax(outputs.data,1)
preds = torch.argmax(probs,1)
predictions[:,i] = preds.detach().cpu().numpy()
var = np.var(predictions, axis=1)
for j in range(batch_size):
pred_var = var[j]
if pred_var == 0:
# I'm sure about the prediction, so I'm going to predict
predicted += 1
prediction = int(np.percentile(predictions[j,:], 50))
correct += (prediction == labels[j].numpy()).sum().item()
else:
# I'm not sure about the prediction, so I'll skip the prediction
uncertain_vars.append(pred_var)
img = np.multiply(images[j].permute(1, 2, 0).detach().cpu().numpy(), 255).astype(np.uint8)
uncertain_images.append(img)
print('Accuracy of the network on the {0:d} predicted test images: {1:.2f} %'.format(predicted, (100 * correct / predicted)))
plot_uncertain_images(uncertain_images, uncertain_vars)
## do the same trick, but then on 64 randomly generated images ('noise')
batch_size = 64
predicted = 0
images_random = torch.rand(batch_size,1,28,28)
labels_random = torch.randint(0,10,(batch_size,))
predictions = np.zeros((batch_size, samples)).astype(np.uint8)
certain_labels = []
certain_images = []
for i in range(samples):
outputs = classifier(images_random.to(device))
probs = F.softmax(outputs.data,1)
preds = torch.argmax(probs,1)
predictions[:,i] = preds.detach().cpu().numpy()
var = np.var(predictions, axis=1)
for j in range(batch_size):
pred_var = var[j]
if pred_var == 0:
# I'm sure about the prediction, so I'm going to predict
predicted += 1
prediction = int(np.percentile(predictions[j,:], 50))
img = np.multiply(images_random[j].permute(1, 2, 0).detach().cpu().numpy(), 255).astype(np.uint8)
certain_images.append(img)
certain_labels.append(prediction)
else:
# I'm not sure about the prediction, so I'll skip the prediction
pass
print('{0:d} predictions were made on {1:d} images with random noise'.format(predicted, batch_size))
for h in range(len(certain_images)):
plt.imshow(certain_images[h])
plt.title('Prediction: {:d}'.format(certain_labels[h]))
plt.show()
My main question: what is a good "decision rule" to select the MNIST-digits the BayesianCNN is less confident about?
As you can see in my script (pred_var == 0), I sampled the BayesianCNN 100 times and then rejected a digit when the variance of the 100 estimates exceeded 0 (meaning that the prediction is rejected when at least one of estimates deviates from the rest). I have also done a sanity check by simulating 64 random noise images, and then checking whether the BayesianCNN is giving uniform predictions there...
This is one of the outputs that was generated:
"Accuracy of the network on the 9054 predicted test images: 99.96 %"
"5 predictions were made on 64 images with random noise"
Logically, these 5 predictions (all predictions were of digit '8' by the way) are much better than the 64 predictions that were made with a standard LeNet trained on MNIST (without the Bayesian Layers).
Another question: the prediction of the BayesianCNN was done with the torch.nn.functional.softmax function (followed by the selection of the highest probability of the softmax). However I'm wondering if this softmax approach is correct... what would you advise? Is there a better probabilistic/Bayesian way?
Thanks in advance!
Hello, first of all amazing work, and thank you for this project!
I'm trying to train simple 3-layered NN and I encountered some problems I wanted to ask about. Here is my model:
BayesianRegressor(
(blinear1): BayesianLinear(
(weight_sampler): GaussianVariational()
(bias_sampler): GaussianVariational()
(weight_prior_dist): ScaleMixturePrior()
(bias_prior_dist): ScaleMixturePrior()
)
(relu): ReLU()
(blinear2): BayesianLinear(
(weight_sampler): GaussianVariational()
(bias_sampler): GaussianVariational()
(weight_prior_dist): ScaleMixturePrior()
(bias_prior_dist): ScaleMixturePrior()
)
(relu2): ReLU()
(blinear3): BayesianLinear(
(weight_sampler): GaussianVariational()
(bias_sampler): GaussianVariational()
(weight_prior_dist): ScaleMixturePrior()
(bias_prior_dist): ScaleMixturePrior()
)
)
I'm training it on dataset with prices of flats/houses I recently scraped, and I've encountered problem I cannot seem to fully understand: after a few epochs, loss returned by the model.sample_elbo method is sometimes equal to nan, which when backpropagated breaks the whole training, as some of the weights are 'optimized' to nans:
model_copy.sample_elbo(inputs=datapoints.to(device),
labels=labels.to(device),
criterion=criterion,
sample_nbr=3,
complexity_cost_weight=1/X_train.shape[0])
I managed to track down where the incorrect values appears first, before backpropagation of these nans, and it turned out that value of log_prior in first bayesian layer is sometimes equal to -inf
first_layer = list(model_copy.modules())[0].blinear1
first_layer .log_prior # returns -inf
Going further I checked that the problem is in weight_prior_dist, which sometimes, like one in 5 times returns -inf:
w =first_layer.weight_sampler.sample() #sampled weigths
prior_dist = first_layer.weight_prior_dist
print(prior_dist.log_prior(w)) #sometimes returns -inf
Going deeper I realised, that the problem is in prior_pdf of first prior distribution in weight_prior_dist of first layer. Some of logarithms of probabilities for the sampled values of weights (prior_dist.dist1.log_prob(w)
) are very small, equal to ~-100, and when passed through torch.exp such small values are approximated to 0. When these 0-weights go through torch.log in prior_dist.log_prior(w)
they are equal to -inf, and the whole mean approaches then -inf, which corrupts further calculations of loss:
prob_n1 = torch.exp(prior_dist.dist1.log_prob(w)) # minimal value of this tensor is equal to 0
if prior_dist.dist2 is not None:
prob_n2 = torch.exp(prior_dist.dist2.log_prob(w))
prior_pdf = (prior_dist.pi * prob_n1 + (1 - prior_dist.pi) * prob_n2) # minimal value of this tensor is equal to 0
(torch.log(prior_pdf)).mean() #formula for calculating log_prior of weight_prior_dist, returns -inf
If I understand correctly, it means that the probabilities of such sampled weights for prior distribution are very very small, approaching zero, but could you suggest me the way of tackling this problem somehow, so they remain very small, and not zero? Or maybe the problem is different?
I'm still learning details of Bayesian DL, so I hope there aren't so many silly mistakes, and thank you for any kind of help!
best regards
Rafał
I'm trying to train a bayesian LSTM to predict remaining useful lifetime using windows of ten samples with roughly 600 features. I previously trained a conventional LSTM in tensorflow and therefore rebuild the architecture in pytorch to be able to use blitz.
The problem is that when I train using the elbo loss the loss converges quickly (but not to zero, which is not a problem I suppose?) but the accuracy is not doing anyting.
I also tried training using normal cross entropy loss, which works perfectly but I'm not sure if the 'bayesianity' is still valid?
Another attempt is to freeze the model for the first part of training and then unfreeze and continue training. In that case the model converges (using elbo loss) and the accuracy improves, but once I unfreeze and continue training the accuracy drops again.
Any experiences on that? Is there something I could change about the code? Is it even valid to check accuracy with bayesian networks? I think it should be because the network should still predict properly most times? (I'm completely new to bayesian nets though and might not have understood everything fully...)
Hi, when I try to train a simple model using Blizt I get huge memory demand > 12 GB for a very small dataset.
@variational_estimator
class BayesianLstm(Module):
def init(self, output_size=1, input_size=24, hidden_size=32, seq_length=30, hidden_neurons=8, batch_size=512):
super(BayesianLstm, self).init()
self.input_size = input_size
self.hidden_size = hidden_size
self.seq_length = seq_length
self.hidden_layer_size = hidden_size
# First lstm cell
self.lstm1 = BayesianLSTM(input_size, hidden_size)
# second lstm cell
self.lstm2 = BayesianLSTM(hidden_size, hidden_size*2)
# first fully connected layer
self.fc1 = BayesianLinear(hidden_size * 2, hidden_neurons)
self.act1 = nn.ReLU()
# self.bat1 = nn.BatchNorm1d(num_features=hidden_neurons)
self.drop = nn.Dropout(inplace=True, p=0.5)
# second fully connected layer
self.fc2 = BayesianLinear(hidden_neurons, hidden_neurons)
self.act2 = nn.ReLU()
# self.bat2 = nn.BatchNorm1d(num_features=hidden_neurons)
# output
self.output = BayesianLinear(hidden_neurons, output_size)
My data is of shape [batchsize, sequence_length, number_features]
I tried this for batchsize = 512, sequence_length= 30, number_features=24.
Hi, since word embedding is a crucial part of modern NLP, do you have any plans to support Bayesian word embeddings or variational word embeddings (see How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection )?
In the code below the log_prior
is calculated by taking the mean of the log probabilities, however the paper describes taking the product of these weights.
blitz-bayesian-deep-learning/blitz/modules/weight_sampler.py
Lines 61 to 72 in 566eaa4
Would it perhaps be preferable to use something like:
reduce(lambda a, b: a * b, log(prior_pdf))
hi , I want to know what does the parameter "future_length" in stock_blstm mean?Thank you!
Hello. I'm experimenting with your package in binary classification.
But I checked the loss function and found that there was a negative float.
I checked your code and it's log prior, can you say it's learning well?
in your code
loss = criterion(outputs, labels) +self.nn_kl_divergence() * complexity_cost_weight
<-> criterion(outputs, labels) +(module.log_variational_posterior - module.log_prior
) * complexity_cost_weight
in my code
criterion = torch.nn.BCELoss()
loss = classifier.sample_elbo(inputs=datapoints.to(device),
labels=labels.to(device).float(),
criterion=criterion,
sample_nbr=3,
complexity_cost_weight= 0.5
)
loss.backward()
Thanks :)
log_posteriors = -log_sqrt2pi - torch.log(self.sigma) - (((w - self.mu) ** 2)/(2 * self.sigma ** 2)) - 0.5
why is there a -0.5
at the end of the line? The log-likelihood of a Gaussian does not have that -0.5
.
Thank you for your implementation. However I stumble across the initialization of prior_sigma_1
, prior_sigma_2
and posterior_rho_init
. I'm using blitz for a image based regression task with a custom dataset.
How can I asure, that these parameters are set correctly?
Hi,
I am trying to use Blitz for a regression problem. Unfortunately, my nets don´t seem to learn the variance of the data correctly. The variance in the predictions is so low, that the true value is never within a reasonable confidence interval around the mean of the predictions. However, the means of the predictions are acceptable and clearly show that the nets are learning something. For better understanding i attached a plot.
What I have tried so far:
• Different architectures with and without convolutional layers
• Changing prior_sigma_1, prior_sigma_2 and prior_pi
• Changing complexity_cost_weight in the sample_elbo method
Best regards
Lukas
Firstly, thanks for your implementation for bayes by backprop in Blitz. This is a very nice tool and helped us a lot. While, I have a minor suggestion and I hope you can consider it. It will be very helpful to not only return a total loss when training the Bayesian layer, but return two separate loss: log likelihood and KL divergence. This could be more beneficial to see how the trade-off between two loss is achieved, and benefit our training process.
It should be 2 ** (num_batches - batch_idx) / (2 ** num_batches - 1)
Hi,
I tired the Boston regression example, but the result is somewhat strange for me:
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 31968.1094
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 31707.7559
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 31548.9590
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 31237.2090
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 30955.3594
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 30741.1934
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 30444.9160
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 30193.9434
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 29945.0215
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 29647.4902
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 29413.7676
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 29227.4375
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 28922.7090
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 28641.3125
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 28439.4082
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 28159.0293
CI acc: 0.00, CI upper acc: 0.63, CI lower acc: 0.37
Loss: 27873.6797
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 27612.1309
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 27459.8281
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 27111.7598
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 26903.3359
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 26611.5840
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 26382.6406
CI acc: 0.01, CI upper acc: 0.63, CI lower acc: 0.38
Loss: 26132.0469Process finished with exit code 0
The accuracy is not improving. Could you please elaborate a little more what is happening?
Thanks!
-Daniel
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.