riashat / deep-bayesian-active-learning Goto Github PK

Code for Deep Bayesian Active Learning (ICML 2017)

Python 100.00%

deep-bayesian-active-learning's Introduction

Deep-Bayesian-Active-Learning

If you use this code for academic research, you are highly encouraged to cite the following paper: Yarin Gal, Riashat Islam, Zoubin Ghahramani. "Deep Bayesian Active Learning". ICML 2017. https://arxiv.org/pdf/1703.02910.pdf

Comments: Code available with both Keras and Lasagne. If you are using Keras, make sure you use the local keras version used in this repository.

deep-bayesian-active-learning's People

Contributors

Stargazers

Watchers

deep-bayesian-active-learning's Issues

Question about Var Ratio acquisition function

Hi,

First, I would like to great work and thank you for making the code publicly available.

Second, I have a question regarding the var ratio acquisition function:

I have trouble understanding what the Var Ratio acquisition function does and how you implemented it. In your paper Deep Bayesian Active Learning With Image Data, you defined it as :
VarRatio(x) = 1 - max_y { P[ y | x, Dtrain] }
I understand that this measures the lack of confidence because in the extreme case where the softmax gives equal probabilities to all classes we would have:
max_y { P[ y | x, Dtrain] } = 1 / C
and thus
VarRatio(x) = 1 - 1 / C
which is basically the maximum
But in our case we do 100 forward passes through the stochastic network, we get a 100 sets of predictions for each data point in the pool subset (2000 points), and then what is P[ y | x, Dtrain] ?

If I understood your code correctly, for every x in the pool subset, you take the most predicted often class and define max_y { P[ y | x, Dtrain] } as the number of times the class was predicted divided by the total number of predictions (100 in this case).

Am I correct?

If I am correct, is there a particular reason why you chose this definition?
Is this the standard definition in the litterature?
Is this different from averaging the scores and then taking the maximum?

will multiple test take long?

Hi, in Dropout_Bald_Q10_N1000_Paper.py, line 226 and 228, the program will test training data in pool dropout_iterations times. If the whole training set is very large (i.e. training data in pool is very large), will this part of test take a very long time?

Random subsampling in Var Ratios and BALD

Hi,

In Var Ratios and BALD acquisition functions you first randomly sample a subset (2000 points) of the pool set and then select the points that maximize either Var Ratios or BALD in this subset. Is there any reason for that? Is it only to minimize the acquisition time? Sorry, I have not found any reference to that in the paper.

Thanks!

How to run

Your work looks very interesting. Can you provide a cheat sheet showing how to run examples?

Thanks,
Jay Urbain

Is your model CNN or BCNN?

Hi, I have read your paper and code. In file Dropout_Bald_Q10_N1000_Paper.py, it seems that your model is a CNN in Keras, but in your paper, the model is a BCNN with prior probability distributions placed over its weights. So how do you implement BCNN by Keras? Especially the implementation of training method (Bayes by Backprop, for example) .

Besides, could you please tell me which code file is corresponding to section 5.2. Importance of model uncertainty in your paper (i.e. the code that gets results in Figure 2)?

Thanks!

about the tractable estimator

Hi, in the paper, you make a proof that when T goes to infinity, the estimate of conditional mutual information approaches to the real value of conditional mutual information of output y and parameters w. I wonder that why is this necessary? If I can derive an equation which is a proportional of conditional mutual information, can I use it to measure the uncertainty in the view of BALD? Why or why not?
Thanks!

Deterministic Var Ratios and deterministic BALD

I am having problems to understand deterministic Var Ratios and deterministic BALD acquisition functions... For example, in deterministic BALD, since the neural network is deterministic, is not the mutual information equals to 0 for all points? Would this acquisition function be equivalent to random? Thanks in advance!

riashat / deep-bayesian-active-learning Goto Github PK

deep-bayesian-active-learning's Introduction

Deep-Bayesian-Active-Learning

deep-bayesian-active-learning's People

Contributors

Stargazers

Watchers

Forkers

deep-bayesian-active-learning's Issues

Question about Var Ratio acquisition function

will multiple test take long?

Random subsampling in Var Ratios and BALD

How to run

Is your model CNN or BCNN?

about the tractable estimator

Deterministic Var Ratios and deterministic BALD

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent