The badge from jordanash

Repetitive indices in badge_sampling.py init_centers function

I notice that there is no mechanism to avoid repetition of indices in the indsAll returned in the end.

Minor MLP typo

Hey Jordan, lm2 isn't actually used in this MLP implementation:

Line 191 in b170600

self.lm2 = nn.Linear(embSize, embSize)

This seems like a minor typo since the current functionality is consistent with the paper (two-layer perceptron: self.lm1 + self.linear), but figured I'd point it out.

Thanks

Large embedding vector, large number of classes?

Hi authors,
Thanks for pushing this fantastic tool. I have a question regarding to the approach itself: When my my last layer dimension is really large + I have large number of classes to cover, if we compute the gradient vector, its dimension(embDim * nLab) will be of orders of magnitude 10^4 or more. Do you think BADGE is an efficient solution in this case?

Does the gradients embedding in the "get_grad_embedding" are lack of '-' ?

I try to implement how to get those derivation，but I found I have an additional negative symbol，compared to the author's code. Could anyone account for this problem？Thanks very much.

license

Hi,
Could you add a license to this project? (If you want the most permissive, you probably look for MIT license)

Entropy sampling issue

There is an error in your implementation of entropy sampling.

The following is an example and the result is when using implemented entropy sampling.

probs = [[0.1, 0.2, 0.7], [0.0, 0.1, 0.9], [0.0, 0.5, 0.5]]
probs = torch.tensor(probs)

log_probs = torch.log(probs)

U = (probs*log_probs).sum(1)

print(log_probs)
print(U)
---------
-- Print Result -- 
tensor([[-2.3026, -1.6094, -0.3567],
        [   -inf, -2.3026, -0.1054],
        [   -inf, -0.6931, -0.6931]])
tensor([-0.8018,     nan,     nan])
------------

If this issue is not addressed, the results of your implemented entropy sampling will inevitably be much worse than margin sampling.

This can be solved simply by adding the following code:

log_probs = torch.log(probs)

log_probs[log_probs == float("-inf")] = 0
log_probs[log_probs == float("inf")] = 0

No pseudo/hypothetical labels founded in the code.

Hi Jordan Ash,

In your paper, the gradient embedding is from the loss between the network output and the hypothetical labels (inferred from the network output).

However, in your code, I didn't find anything about pseudo/hypothetical labels.

In the file badge_sampling.py, it seems that you directly use the true labels to guide your selection. If so, this would be an unfair comparison.

    gradEmbedding = self.get_grad_embedding(self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled]).numpy()

I'm not sure if I miss something. Could your show how you use the hypothetical labels in your code?

Thanks,
Rui

.

Potential Typo in badge_sampling.py

Line: https://github.com/JordanAsh/badge/blob/master/query_strategies/badge_sampling.py#L47

Error: 'embs' is referenced before it is declared. I think the code should be: embs = torch.Tensor(X)

Edit: thank you for this repo!

Different results

Hi authors,
It's great that you publish the source code, but the default hyper-parameters seem not to be correctly tuned, so could you please share the hyper-params to produce the graph below (taken from your paper)
I also add the graph showing the results that I ran with hyper-params from your source code with command
python run.py --model vgg --nQuery 100 --data CIFAR10 --alg badge

Reported in the paper:

Run default code: (the x-axis is the number of data point, divided by 100 - the query batch size)
The gap is nearly 10% even with more data

How can I use bait for regression task? It seems like that there is no code for that?

Your work is excellent and valuable.
Maybe I missed some code, where is the code of using the bait for regression?
please remind me if you see my message, thank you very much for your contribution.

Different loss function

Hi! Very great work and wonderfull code. Thanks man!
Have you tried non cross-entropy-based loss functions? I think off loss function in unsupervised methods for anomaly detection for filtering out anomly datapoints. One example for such a loss function is the differences between student and teacher intermediate layers. Am I right, that if it only depends on gradient, we could skip the hallucination-part and use the derivate only?
Have a nice day
Tobi

Utility of calculating gram matrix in k-means++ method

Hi Jordan Ash,

First of all, thank you for sharing such a clear and concise code. It has been a delight while going through the code step by step and understanding it.
I was curious about the last steps in kmeans++ code that involves calculating gram matrix followed by eigenvalues calculation. However, I saw no usage of it anywhere while sampling. I wanted to know if its as usual or something is missing in kmeans code.

Thanks,
Shwetank

Got different accuracy results

Thank you for sharing the source code for this great work.

I am trying to replicate some of your baseline results.
However, the result from the baseline (LeastConfidence) I got is much better than the one reported in your paper.
Here is my result accuracy plot (It achieves 40% test accuracy much earlier than the reported accuracy in the paper).

Query size: 100
Dataset: CIFAR-10
Network: Resnet
Strategy: LeastConfidence

Should I do something to replicate your results?

get_exp_grad_embedding explanation

Hi,
I do not see why get_exp_grad_embedding, implemented as it is, allows to compute the last-layer Fisher matrices.
Thank you in advance for your answer!

Typo in run.py

badge/run.py

Line 56 in b170600

if 'CIFAR' in opts.data: args.lamb = 1e-2

The 'args.lamb' should be opts. lamb.

jordanash / badge Goto Github PK

badge's People

Stargazers

Watchers

Forkers

badge's Issues

Recommend Projects

Recommend Topics

Recommend Org